JSON Stream
json-stream is a Python library (version 2.5.0, actively maintained) designed for efficient streaming JSON encoding and decoding. It allows processing JSON data in chunks, rather than loading the entire document into memory, which significantly reduces memory consumption and latency for large files or network streams. It provides a Pythonic dict/list-like interface for reading and uses generators for writing, making it suitable for web applications, data pipelines, and other scenarios requiring optimized JSON handling.
Warnings
- gotcha When reading in default 'transient' mode (e.g., `json_stream.load(file_obj)`), data is discarded after it's been read. Attempting to access previously consumed elements (e.g., `data['items'][0]` then `data['items'][0]` again if `items` is a large list) will raise a `TransientAccessException`.
- gotcha When streaming from network responses, using `requests.get(url, stream=True).json()` still reads the entire JSON payload into memory before parsing. The `requests` library's `.json()` method is not stream-aware in this context.
- gotcha The standard library's `json.dump()` or `json.dumps()` functions, when given a regular Python `dict` or `list`, will build the entire data structure in memory first, even if using `json-stream` for other parts of your application.
- gotcha While `json-stream` excels at memory efficiency, be mindful of common JSON syntax errors (e.g., trailing commas, single quotes instead of double quotes for keys/strings, unquoted keys, comments) which are not permitted in strict JSON and can lead to `JSONDecodeError`.
Install
-
pip install json-stream -
pip install json-stream[rs] -
pip install json-stream[requests]
Imports
- load
import json_stream data = json_stream.load(file_object)
- streamable_dict
from json_stream.writer import streamable_dict
- streamable_list
from json_stream.writer import streamable_list
Quickstart
import json_stream
from json_stream.writer import streamable_dict, streamable_list
import io
import json
# --- Reading JSON (Decoding) ---
json_data_str = '{"name": "Alice", "items": [1, 2, 3], "settings": {"active": true}}'
# Simulate a file-like object for streaming
json_stream_input = io.StringIO(json_data_str)
# Load the stream in transient mode (default)
data = json_stream.load(json_stream_input)
# Access data - values are loaded as accessed
name = data['name']
first_item = data['items'][0]
setting_active = data['settings']['active']
print(f"Decoded Name: {name}")
print(f"Decoded First Item: {first_item}")
print(f"Decoded Setting Active: {setting_active}")
# --- Writing JSON (Encoding) ---
def generate_items():
for i in range(3):
yield i + 1
def generate_data():
yield 'id', 123
yield 'status', 'processed'
yield 'results', streamable_list(generate_items())
# Use streamable_dict for the top-level object
streaming_output = streamable_dict(generate_data())
# Dump to a string (or file) using the standard json module
# The streamable_dict/list objects adapt to json.dump/dumps
encoded_json = json.dumps(streaming_output)
print(f"Encoded JSON: {encoded_json}")
# Expected output for writing is a complete JSON string after dumps() is called.