NDJSON Decoder for Python
The `ndjson` library for Python, currently at version `0.3.1`, provides a `JsonDecoder` and `JsonEncoder` for newline-delimited JSON (NDJSON), also known as JSON Lines. It offers a familiar interface similar to Python's built-in `json` module, enabling efficient reading and writing of NDJSON data to and from file-like objects and strings. This lightweight library has no external dependencies and is particularly useful for processing large datasets or streaming applications where each line represents a complete, independent JSON object. Although its last release was in 2020 and its PyPI status is 'Pre-Alpha', it is considered stable and functional for its stated purpose.
Warnings
- gotcha The `ndjson` library's official PyPI status is "2 - Pre-Alpha", which might suggest instability or an experimental nature. However, the library has been stable since its 0.3.1 release in February 2020 and "works as advertised", making this status misleading for its current functional state.
- gotcha Do not attempt to parse an entire NDJSON file using Python's built-in `json.load()` (e.g., `json.load(open('data.ndjson'))`). This will typically result in a `json.JSONDecodeError` because NDJSON files contain multiple top-level JSON objects, not a single one, or a `MemoryError` for very large files.
- gotcha When writing NDJSON, ensure each record is a valid, self-contained JSON object on a single line, terminated by a newline character (`\n`). Do not wrap the entire set of objects in a JSON array (`[]`) or add commas between objects, as this violates the NDJSON format and will cause parsing issues.
- deprecated The library's last release (`0.3.1`) was in February 2020. While its core functionality is stable and complete for handling NDJSON, users seeking active development, bug fixes beyond the existing scope, or new features might find the project inactive.
- gotcha NDJSON files are expected to be UTF-8 encoded. Parsing issues can occur with files saved with a Byte Order Mark (BOM) or mixed encodings. This is a common issue for many text-based file formats.
Install
-
pip install ndjson
Imports
- ndjson
import ndjson
- load
ndjson.load(file_object)
- dump
ndjson.dump(data, file_object)
- loads
ndjson.loads(string_data)
- dumps
ndjson.dumps(data)
- reader
ndjson.reader(file_object)
- writer
ndjson.writer(file_object)
Quickstart
import ndjson
import os
# Example data
data_to_write = [
{"name": "Alice", "age": 30, "city": "New York"},
{"name": "Bob", "age": 24, "city": "San Francisco"},
{"name": "Charlie", "age": 35, "city": "London"}
]
file_path = "example.ndjson"
# --- Writing NDJSON to a file ---
with open(file_path, 'w', encoding='utf-8') as f:
# Using ndjson.dump for a list of objects
ndjson.dump(data_to_write, f)
print(f"Data written to {file_path} using ndjson.dump")
# Alternatively, using ndjson.writer for streaming individual rows
file_path_writer = "example_writer.ndjson"
with open(file_path_writer, 'w', encoding='utf-8') as f:
writer = ndjson.writer(f)
for record in data_to_write:
writer.writerow(record)
print(f"Data written to {file_path_writer} using ndjson.writer")
# --- Reading NDJSON from a file ---
read_data_dump = []
with open(file_path, 'r', encoding='utf-8') as f:
# Using ndjson.load for reading all objects from a file
read_data_dump = ndjson.load(f)
print(f"\nData read from {file_path} (ndjson.load):\n{read_data_dump}")
# Alternatively, using ndjson.reader for streaming individual rows
read_data_reader = []
with open(file_path_writer, 'r', encoding='utf-8') as f:
reader = ndjson.reader(f)
for row in reader:
read_data_reader.append(row)
print(f"\nData read from {file_path_writer} (ndjson.reader):\n{read_data_reader}")
# Clean up created files
os.remove(file_path)
os.remove(file_path_writer)