ijson

raw JSON →
3.5.0 verified Tue May 12 auth: no python install: verified quickstart: verified

Ijson is an iterative JSON parser for Python that provides standard iterator interfaces. It enables efficient processing of large JSON data streams without loading the entire document into memory, making it ideal for handling massive JSON files, streaming APIs, and memory-constrained environments. The library is currently at version 3.5.0 and maintains an active release cadence with regular updates and binary wheel support for major platforms.

pip install ijson
error ModuleNotFoundError: No module named 'ijson'
cause The 'ijson' package is not installed in your Python environment or the Python interpreter being used does not have access to the installed package.
fix
Install the package using pip: pip install ijson
error ijson.common.IncompleteJSONError: parse error: trailing garbage
cause This error occurs when the input JSON stream contains multiple top-level JSON objects or values, which is not standard JSON but is common in JSON line-delimited streams or concatenated JSON documents.
fix
Pass the multiple_values=True option to the ijson parsing function you are using (e.g., ijson.items(f, 'prefix', multiple_values=True)).
error ijson.common.IncompleteJSONError: lexical error: invalid char in json text
cause This typically indicates that the input data is not valid JSON, contains invalid characters (e.g., non-UTF-8 bytes), or includes non-standard JSON values like `NaN`.
fix
Ensure your input data is strictly valid JSON and properly UTF-8 encoded. If the data is not truly UTF-8, consider pre-processing it with iconv -f utf8 -t utf8 -c or similar tools to correct invalid byte sequences, or use an errors='ignore' or errors='replace' strategy when decoding bytes to strings if reading as text.
error YAJL shared object not found
cause The ijson library, particularly its faster backends (`yajl2_c`, `yajl2_cffi`, `yajl2`), relies on the YAJL C library, which is not found or correctly linked on your system.
fix
Install the YAJL development libraries (e.g., sudo apt-get install libyajl-dev on Debian/Ubuntu, brew install yajl on macOS) and then reinstall ijson. Alternatively, you can explicitly use the pure Python backend by importing ijson.backends.python as ijson.
error TypeError: can't concat bytes to str
cause This error arises when you mix byte-string (binary) and regular string (text) data during file processing or when `ijson` expects binary input but receives text, or vice-versa, often due to how the input file is opened.
fix
Open your JSON file in binary read mode ('rb') if feeding it directly to ijson functions, as ijson prefers binary input. Example: with open('data.json', 'rb') as f: ....
gotcha Default backend choice can impact performance significantly. By default, `import ijson` will attempt to use available C-based backends (`yajl2_c`, `yajl2_cffi`, `yajl2`) in order, falling back to the pure Python backend if none are found or can be compiled. The pure Python backend is notably slower.
fix For performance-critical applications, explicitly import the fastest available backend (e.g., `import ijson.backends.yajl2_cffi as ijson`) or ensure the `IJSON_BACKEND` environment variable is set. Install the necessary C libraries if using `yajl2_c`.
gotcha `ijson` expects binary file-like objects (e.g., `open('file.json', 'rb')`). While it can accept text-mode file objects (`'r'`), it will internally encode the strings to UTF-8 bytes, which can incur a performance penalty and issues a warning that is not visible by default.
fix Always open JSON files in binary read mode: `with open('large_file.json', 'rb') as f: ...`. Ensure custom file-like objects return `bytes` from their `read()` method.
breaking Malformed or incomplete JSON data leads to `ijson.common.JSONError` or `ijson.common.IncompleteJSONError`, at which point the parser gives up. It cannot automatically recover or skip to the next 'valid' record, as the error fundamentally invalidates the stream's structure from the parser's perspective.
fix Ensure the input JSON is well-formed. If dealing with potentially malformed streams, implement robust error handling in the data generation source, or pre-process/sanitize the JSON stream before feeding it to `ijson`. Custom error recovery for partial JSON (like with LLM outputs) might require a different library or manual stream manipulation.
gotcha The special prefix `'.item'` is used to denote items within a JSON array. If your actual JSON data contains a key named 'item' directly within an object that is part of an array, this could lead to ambiguity or unexpected parsing behavior.
fix Be mindful of your JSON structure and chosen prefixes. If a collision is likely, use more specific prefixes to navigate around the 'item' keyword or consider using the lower-level `parse` function for finer-grained control over events.
pip install ijson[yajl2_c]
pip install ijson[yajl2_cffi]
python os / libc variant status wheel install import disk
3.10 alpine (musl) ijson wheel - 0.02s 18.3M
3.10 alpine (musl) yajl2_c wheel - 0.02s 18.3M
3.10 alpine (musl) yajl2_cffi wheel - 0.02s 18.3M
3.10 alpine (musl) ijson - - 0.02s 18.3M
3.10 alpine (musl) yajl2_c - - 0.02s 18.3M
3.10 alpine (musl) yajl2_cffi - - 0.02s 18.3M
3.10 slim (glibc) ijson wheel 1.6s 0.01s 19M
3.10 slim (glibc) yajl2_c wheel 1.6s 0.01s 19M
3.10 slim (glibc) yajl2_cffi wheel 1.6s 0.01s 19M
3.10 slim (glibc) ijson - - 0.01s 19M
3.10 slim (glibc) yajl2_c - - 0.01s 19M
3.10 slim (glibc) yajl2_cffi - - 0.01s 19M
3.11 alpine (musl) ijson wheel - 0.04s 20.2M
3.11 alpine (musl) yajl2_c wheel - 0.04s 20.2M
3.11 alpine (musl) yajl2_cffi wheel - 0.04s 20.2M
3.11 alpine (musl) ijson - - 0.05s 20.2M
3.11 alpine (musl) yajl2_c - - 0.04s 20.2M
3.11 alpine (musl) yajl2_cffi - - 0.04s 20.2M
3.11 slim (glibc) ijson wheel 1.6s 0.03s 21M
3.11 slim (glibc) yajl2_c wheel 1.6s 0.03s 21M
3.11 slim (glibc) yajl2_cffi wheel 1.6s 0.03s 21M
3.11 slim (glibc) ijson - - 0.03s 21M
3.11 slim (glibc) yajl2_c - - 0.03s 21M
3.11 slim (glibc) yajl2_cffi - - 0.03s 21M
3.12 alpine (musl) ijson wheel - 0.04s 12.1M
3.12 alpine (musl) yajl2_c wheel - 0.04s 12.1M
3.12 alpine (musl) yajl2_cffi wheel - 0.04s 12.1M
3.12 alpine (musl) ijson - - 0.05s 12.1M
3.12 alpine (musl) yajl2_c - - 0.04s 12.1M
3.12 alpine (musl) yajl2_cffi - - 0.04s 12.1M
3.12 slim (glibc) ijson wheel 1.5s 0.04s 13M
3.12 slim (glibc) yajl2_c wheel 1.5s 0.05s 13M
3.12 slim (glibc) yajl2_cffi wheel 1.5s 0.04s 13M
3.12 slim (glibc) ijson - - 0.04s 13M
3.12 slim (glibc) yajl2_c - - 0.04s 13M
3.12 slim (glibc) yajl2_cffi - - 0.04s 13M
3.13 alpine (musl) ijson wheel - 0.05s 11.8M
3.13 alpine (musl) yajl2_c wheel - 0.04s 11.8M
3.13 alpine (musl) yajl2_cffi wheel - 0.04s 11.8M
3.13 alpine (musl) ijson - - 0.05s 11.7M
3.13 alpine (musl) yajl2_c - - 0.04s 11.7M
3.13 alpine (musl) yajl2_cffi - - 0.04s 11.7M
3.13 slim (glibc) ijson wheel 1.5s 0.05s 12M
3.13 slim (glibc) yajl2_c wheel 1.5s 0.05s 12M
3.13 slim (glibc) yajl2_cffi wheel 1.5s 0.05s 12M
3.13 slim (glibc) ijson - - 0.04s 12M
3.13 slim (glibc) yajl2_c - - 0.04s 12M
3.13 slim (glibc) yajl2_cffi - - 0.04s 12M
3.9 alpine (musl) ijson wheel - 0.02s 17.8M
3.9 alpine (musl) yajl2_c wheel - 0.02s 17.8M
3.9 alpine (musl) yajl2_cffi wheel - 0.02s 17.8M
3.9 alpine (musl) ijson - - 0.02s 17.8M
3.9 alpine (musl) yajl2_c - - 0.02s 17.8M
3.9 alpine (musl) yajl2_cffi - - 0.02s 17.8M
3.9 slim (glibc) ijson wheel 1.9s 0.02s 18M
3.9 slim (glibc) yajl2_c wheel 1.9s 0.02s 18M
3.9 slim (glibc) yajl2_cffi wheel 1.9s 0.02s 18M
3.9 slim (glibc) ijson - - 0.02s 18M
3.9 slim (glibc) yajl2_c - - 0.02s 18M
3.9 slim (glibc) yajl2_cffi - - 0.02s 18M

This quickstart demonstrates how to use `ijson.items` to iteratively parse JSON data, extracting Python objects from specified paths. It also shows how to explicitly select a backend for improved performance. `ijson` expects file-like objects opened in binary mode (`'rb'`). The path syntax uses `.` for object keys and `.item` for elements within arrays.

import ijson
import os
from io import BytesIO

# Example JSON data (simulating a file-like object)
json_data = b'{"earth": {"europe": [{"name": "Paris", "type": "city"}, {"name": "Rome", "type": "city"}]}, "america": [{"name": "New York", "type": "city"}]}'

# For demonstration, you might use BytesIO or a real file opened in binary mode
with BytesIO(json_data) as f:
    # Using the 'items' function to extract objects under a specific path
    # 'earth.europe.item' means: 'earth' object, then 'europe' array, then each 'item' in the array
    print("European cities:")
    for city in ijson.items(f, 'earth.europe.item'):
        print(city)

# Reset stream for another parse, or open a new file
with BytesIO(json_data) as f:
    print("\nAll cities:")
    # Using 'item' for a top-level array or '.item' for nested array items without specific object keys
    # Or a more general path if structure is less strict
    for city_or_state in ijson.items(f, 'earth..item'): # Matches any item within 'earth' object (e.g., europe.item, america.item)
        if isinstance(city_or_state, dict) and city_or_state.get('type') == 'city':
            print(city_or_state)

# Example with explicit backend selection (recommended for production)
# Ensure 'ijson[yajl2_cffi]' is installed for this to be effective
try:
    import ijson.backends.yajl2_cffi as ijson_fast
    with BytesIO(json_data) as f:
        print("\nEuropean cities (with yajl2_cffi backend):")
        for city in ijson_fast.items(f, 'earth.europe.item'):
            print(city)
except ImportError:
    print("\n'yajl2_cffi' backend not available. Install with 'pip install ijson[yajl2_cffi]'.")