ijson
Ijson is an iterative JSON parser for Python that provides standard iterator interfaces. It enables efficient processing of large JSON data streams without loading the entire document into memory, making it ideal for handling massive JSON files, streaming APIs, and memory-constrained environments. The library is currently at version 3.5.0 and maintains an active release cadence with regular updates and binary wheel support for major platforms.
Warnings
- gotcha Default backend choice can impact performance significantly. By default, `import ijson` will attempt to use available C-based backends (`yajl2_c`, `yajl2_cffi`, `yajl2`) in order, falling back to the pure Python backend if none are found or can be compiled. The pure Python backend is notably slower.
- gotcha `ijson` expects binary file-like objects (e.g., `open('file.json', 'rb')`). While it can accept text-mode file objects (`'r'`), it will internally encode the strings to UTF-8 bytes, which can incur a performance penalty and issues a warning that is not visible by default.
- breaking Malformed or incomplete JSON data leads to `ijson.common.JSONError` or `ijson.common.IncompleteJSONError`, at which point the parser gives up. It cannot automatically recover or skip to the next 'valid' record, as the error fundamentally invalidates the stream's structure from the parser's perspective.
- gotcha The special prefix `'.item'` is used to denote items within a JSON array. If your actual JSON data contains a key named 'item' directly within an object that is part of an array, this could lead to ambiguity or unexpected parsing behavior.
Install
-
pip install ijson -
pip install ijson[yajl2_c] -
pip install ijson[yajl2_cffi]
Imports
- ijson
import ijson
- items
from ijson import items
- parse
from ijson import parse
- yajl2_cffi
import ijson.backends.yajl2_cffi as ijson
Quickstart
import ijson
import os
from io import BytesIO
# Example JSON data (simulating a file-like object)
json_data = b'{"earth": {"europe": [{"name": "Paris", "type": "city"}, {"name": "Rome", "type": "city"}]}, "america": [{"name": "New York", "type": "city"}]}'
# For demonstration, you might use BytesIO or a real file opened in binary mode
with BytesIO(json_data) as f:
# Using the 'items' function to extract objects under a specific path
# 'earth.europe.item' means: 'earth' object, then 'europe' array, then each 'item' in the array
print("European cities:")
for city in ijson.items(f, 'earth.europe.item'):
print(city)
# Reset stream for another parse, or open a new file
with BytesIO(json_data) as f:
print("\nAll cities:")
# Using 'item' for a top-level array or '.item' for nested array items without specific object keys
# Or a more general path if structure is less strict
for city_or_state in ijson.items(f, 'earth..item'): # Matches any item within 'earth' object (e.g., europe.item, america.item)
if isinstance(city_or_state, dict) and city_or_state.get('type') == 'city':
print(city_or_state)
# Example with explicit backend selection (recommended for production)
# Ensure 'ijson[yajl2_cffi]' is installed for this to be effective
try:
import ijson.backends.yajl2_cffi as ijson_fast
with BytesIO(json_data) as f:
print("\nEuropean cities (with yajl2_cffi backend):")
for city in ijson_fast.items(f, 'earth.europe.item'):
print(city)
except ImportError:
print("\n'yajl2_cffi' backend not available. Install with 'pip install ijson[yajl2_cffi]'.")