JSON Text Sequences (RFC 7464)
jsonseq is a Python library that provides support for encoding and decoding JSON text sequences as defined by RFC 7464. This format is designed for streaming multiple JSON objects, each prefixed by an ASCII Record Separator (0x1E) and terminated by a Line Feed (0x0A). It's particularly useful for handling large or indefinite streams of JSON data incrementally. The current version is 1.0.0, with releases historically following an as-needed cadence.
Common errors
-
json.decoder.JSONDecodeError: Expecting value: line 1 column X (char Y)
cause An individual JSON object within the sequence is syntactically malformed (e.g., missing quotes, commas, or incorrect primitive types).fixEnsure all JSON objects passed to `JSONSeqEncoder` are valid, or, when decoding, wrap the iteration over `JSONSeqDecoder` in a `try...except json.JSONDecodeError` block to catch and handle malformed records gracefully. Log the error details (line, column, char) to pinpoint the problematic data. -
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xXX in position Y: invalid start byte
cause The input stream contains bytes that are not valid UTF-8, but the decoder is attempting to interpret them as such.fixVerify the actual encoding of the source data. If it's not UTF-8 (e.g., Latin-1, Windows-1252), you'll need to decode the raw bytes stream with the correct encoding *before* passing it to `JSONSeqDecoder` or ensure the file/network stream is opened with the appropriate `encoding` parameter. -
SyntaxError: unexpected EOF while parsing
cause The Python interpreter encountered the end of a file or stream prematurely while attempting to parse an incomplete JSON object, often due to truncation or an unexpected end of the input stream.fixThis typically indicates an incomplete JSON object or a truncated stream. Ensure the entire JSON text sequence is available to the decoder. If reading from a file, verify the file is not truncated. When processing network streams, handle `EOFError` or ensure all expected data chunks are received.
Warnings
- gotcha RFC 7464 specifies that each JSON text *must* be prefixed by an ASCII Record Separator (RS, 0x1E). However, the `jsonseq` Python library's `JSONSeqEncoder` makes the initial RS optional for encoding, and `JSONSeqDecoder` is lenient, accepting sequences that omit the very first RS. While convenient, strict interoperability with other RFC 7464 implementations might expect the first RS.
- gotcha The RFC 7464 standard suggests that parsers 'SHOULD NOT abort when an octet string contains a malformed JSON text' but instead report the failure and continue. The `jsonseq` library's `JSONSeqDecoder` wraps `json.loads`, which will raise a `json.JSONDecodeError` upon encountering malformed JSON within the sequence. This will halt iteration rather than skipping the bad record and continuing.
- gotcha RFC 7464 explicitly states that all JSON text sequences must be encoded in UTF-8. While Python 3 generally handles Unicode well, incorrect handling of the underlying stream's encoding can lead to `UnicodeDecodeError`.
Install
-
pip install jsonseq
Imports
- JSONSeqEncoder
from jsonseq.encode import JSONSeqEncoder
- JSONSeqDecoder
from jsonseq.decode import JSONSeqDecoder
Quickstart
import io
from jsonseq.encode import JSONSeqEncoder
from jsonseq.decode import JSONSeqDecoder
# --- Encoding ---
data_to_encode = [
{'id': 1, 'name': 'Alice'},
{'id': 2, 'name': 'Bob', 'details': {'age': 30}},
{'id': 3, 'name': 'Charlie'}
]
# Encode to a BytesIO stream
output_buffer = io.BytesIO()
encoder = JSONSeqEncoder(output_buffer)
for obj in data_to_encode:
encoder.encode(obj)
encoded_bytes = output_buffer.getvalue()
print("Encoded JSONSeq:")
print(encoded_bytes.decode('utf-8'))
# --- Decoding ---
# Simulate reading from the stream
input_buffer = io.BytesIO(encoded_bytes)
decoder = JSONSeqDecoder(input_buffer)
decoded_objects = []
for obj in decoder:
decoded_objects.append(obj)
print("\nDecoded objects:")
for obj in decoded_objects:
print(obj)