StreamingJSON
StreamingJSON is a Python library designed to preprocess incomplete JSON strings, transforming them into valid, parseable JSON in real-time. This addresses challenges in stream JSON parsing, especially relevant for Large Language Models (LLMs), by enabling immediate data processing without waiting for full JSON generation. It works by completing fragmented JSON, allowing other standard JSON libraries to parse its output seamlessly. The library is currently in an active development phase, with version 0.0.5 being the latest release, and follows a rapid release cadence.
Warnings
- gotcha Each new JSON stream requires a fresh `streamingjson.Lexer()` instance. Reusing a `Lexer` instance for a different stream without re-initialization can lead to incorrect parsing results or unexpected behavior.
- gotcha StreamingJSON acts as a *preprocessor* to complete incomplete JSON strings into syntactically valid JSON. It does not directly provide an object-streaming interface (like `ijson` or `json-stream`) that yields Python objects as they are parsed from a raw stream. Users must use a standard JSON library (e.g., Python's `json` module) on the *output* of `complete_json()` to convert it into Python objects.
- breaking The library is in an early development stage (0.0.x series). APIs, internal structures, and behavior may change significantly in future minor or patch releases, potentially introducing breaking changes without major version bumps.
- gotcha A bug in earlier versions (fixed in 0.0.5) caused parsing errors when a JSON string contained a literal slash character (`/`). While fixed, this highlights the potential for subtle parsing edge cases in handling string content, especially with complex escape sequences or Unicode.
Install
-
pip install streamingjson
Imports
- Lexer
from streamingjson import Lexer
Quickstart
import streamingjson
# NOTE: A new Lexer instance is required for each JSON stream.
lexer = streamingjson.Lexer()
# Append initial JSON segment
lexer.append_string('{"a":')
print(lexer.complete_json()) # Expected: {"a":null}
# Append more JSON segments
lexer.append_string('[tr')
print(lexer.complete_json()) # Expected: {"a":[true]}
lexer.append_string('ue], "b": "hello')
print(lexer.complete_json()) # Expected: {"a":[true], "b": "hello"}
# Example with escaped characters
new_lexer = streamingjson.Lexer()
new_lexer.append_string('{"key": "value with \"quote\" and \\slash"')
print(new_lexer.complete_json()) # Expected: {"key": "value with \"quote\" and \\slash"}