JsonSlicer
JsonSlicer is a Python library (version 0.1.8) designed for efficient stream or iterative JSON parsing. It processes JSON documents without loading the entire structure into memory, making it suitable for very large files or streams. Written in C and leveraging the YAJL JSON parsing library, it offers high performance. JsonSlicer provides an iterator interface to extract specific data by defining paths using map keys and array indices, yielding matching JSON data as complete Python objects. Its release cadence is irregular, with minor updates addressing compatibility and performance.
Warnings
- gotcha Installation requires a C++ compiler and `pkg-config` to build the YAJL dependency, which might fail on systems without proper development tools.
- gotcha For optimal performance, especially with very large JSON files, `JsonSlicer` prefers binary input. Using text input (e.g., a file opened without `rb` mode) incurs a ~3% performance overhead due to unnecessary encoding/decoding.
- gotcha While generally faster than pure Python JSON parsers, `jsonslicer` can sometimes be slower than `ijson` with its C backend for very deep data structures if many intermediate C values need to be converted to Python objects.
- deprecated Versions prior to 0.1.5 might have compatibility issues with Python 3.8 and newer due to a compatibility fix introduced in that release.
- gotcha The `yajl_verbose_errors` flag can be set to enable more detailed error messages, which might be helpful for debugging malformed JSON, as the default error verbosity might be limited.
Install
-
pip install jsonslicer
Imports
- JsonSlicer
from jsonslicer import JsonSlicer
Quickstart
import os
import json
from jsonslicer import JsonSlicer
# Create a dummy JSON file for demonstration
data = {
"friends": [
{"name": "John", "age": 31},
{"name": "Ivan", "age": 26}
],
"colleagues": {
"manager": {"name": "Jack", "age": 33},
"subordinate": {"name": "Lucy", "age": 21}
}
}
with open('people.json', 'w') as f:
json.dump(data, f)
# Extract a specific element
with open('people.json') as data_file:
ivans_age = next(JsonSlicer(data_file, ('friends', 1, 'age')))
print(f"Ivan's age: {ivans_age}") # Expected: Ivan's age: 26
# Iterate over a collection using wildcards (None)
with open('people.json') as data_file:
print("\nFriends:")
for person in JsonSlicer(data_file, ('friends', None)):
print(person) # Expected: {'name': 'John', 'age': 31}, {'name': 'Ivan', 'age': 26}
# Iterate over different types of collections at once
with open('people.json') as data_file:
print("\nAll people:")
for person in JsonSlicer(data_file, (None, None)):
print(person) # Expected: all 4 people objects
# Clean up the dummy file
os.remove('people.json')