jsonlines Python Library
jsonlines is an active Python library (version 4.0.0) that provides helpers for working with the JSON Lines (also known as NDJSON) text file format. It simplifies reading and writing streams of newline-delimited JSON objects, offering features like transparent handling of string and byte streams, support for optional faster JSON parsers (like `orjson` and `ujson`), built-in data validation, and robust error handling. Its design prevents common pitfalls and ensures standard-compliant line breaking.
Warnings
- gotcha Always use `jsonlines.open()`, `jsonlines.Reader`, or `jsonlines.Writer` within a `with` statement (as a context manager) or ensure `.close()` is called manually. Failing to do so can lead to unwritten data or unreleased file handles.
- gotcha The `jsonlines` format expects one complete JSON object per line, delimited by a newline character. It is NOT a single large JSON array. Attempting to parse a traditional JSON array file with `jsonlines` will likely fail or yield incorrect results.
- gotcha When providing a custom `dumps` callable to `jsonlines.Writer`, the `compact` and `sort_keys` arguments will be ignored. The custom callable takes precedence over these built-in formatting options.
- gotcha While Python's file I/O often handles various newline characters, the `jsonlines` specification primarily uses `\n`. For maximum compatibility when *generating* `jsonlines` files that might be consumed by other tools, it's best to ensure consistent `\n` line endings. The library handles standard-compliant line breaking.
- breaking Some external command-line tools (e.g., DuckDB CLI v1.5.0) have changed their `--jsonlines` flag behavior or removed it, opting instead for `--json` which outputs a single JSON array (not JSONL). This can silently break ETL pipelines that expect newline-delimited JSON. This is a common point of confusion in the ecosystem, not a direct breaking change in this `jsonlines` Python library, but relevant to its users.
Install
-
pip install jsonlines
Imports
- open
from jsonlines import open
- Reader
from jsonlines import Reader
- Writer
from jsonlines import Writer
Quickstart
import jsonlines
import io
import os
# Create a dummy file for demonstration
output_file = "example_data.jsonl"
data_to_write = [
{"id": 1, "name": "Alice", "email": "alice@example.com"},
{"id": 2, "name": "Bob", "email": "bob@example.com", "status": "active"},
{"id": 3, "name": "Charlie", "data": {"city": "New York", "zip": "10001"}}
]
# Writing JSON Lines data using the convenience 'open' function
with jsonlines.open(output_file, mode='w') as writer:
writer.write_all(data_to_write)
print(f"Wrote {len(data_to_write)} records to {output_file}")
# Reading JSON Lines data
print("\nReading records:")
read_records = []
with jsonlines.open(output_file) as reader:
for obj in reader:
read_records.append(obj)
print(obj)
print(f"Total records read: {len(read_records)}")
assert read_records == data_to_write
# Example of writing to an in-memory buffer using Writer class
buffer = io.StringIO()
with jsonlines.Writer(buffer) as writer:
writer.write({"log_event": "started", "timestamp": "2023-01-01T12:00:00Z"})
writer.write({"log_event": "processed", "item_id": 123})
buffer.seek(0) # Reset buffer position to read
print("\nReading from in-memory buffer:")
with jsonlines.Reader(buffer) as reader:
for log_entry in reader:
print(log_entry)
# Clean up the dummy file
os.remove(output_file)
print(f"\nCleaned up {output_file}")