pysimdjson
pysimdjson provides high-performance Python bindings for the simdjson C++ library, a SIMD-accelerated JSON parser. It offers both a compatibility API similar to Python's built-in `json` module and a native API for significantly faster parsing, especially when only parts of a JSON document are needed. The library is actively maintained, with the current version being 7.0.2.
Warnings
- breaking Python 3.5 and 3.6 support has been removed in prior major releases. Current versions (>=7.0.0) require Python 3.9 or newer. Ensure your environment meets the Python version requirement.
- gotcha For optimal performance, especially with large JSON documents, avoid fully materializing the entire document into Python objects. Use the native `Parser` API with methods like `at_pointer()` or direct proxy access (e.g., `doc['key']`) to extract only the necessary parts.
- gotcha When reusing a `pysimdjson.Parser` instance, ensure that no `Object` or `Array` proxies from a previously parsed document are still in scope. Calling `parse()` or `load()` on a parser while old proxies exist may lead to a `RuntimeError` due to memory management conflicts.
- gotcha pysimdjson primarily operates on `bytes` and assumes UTF-8 encoding. It does not provide options to specify alternative encodings, unlike the standard `json` module. Providing `str` will be slower due to internal encoding.
Install
-
pip install pysimdjson
Imports
- Parser
from pysimdjson import Parser
- loads
from pysimdjson import loads
Quickstart
from pysimdjson import Parser
json_data = b'{"name": "Alice", "age": 30, "city": "New York", "details": {"occupation": "Engineer", "hobbies": ["reading", "hiking"]}}'
# Using the native Parser API for performance and partial loading
parser = Parser()
try:
# Parsing bytes is generally fastest
doc = parser.parse(json_data)
# Accessing elements without fully materializing the document
name = doc['name'].as_str()
age = doc['age'].as_int()
occupation = doc['details']['occupation'].as_str()
first_hobby = doc['details']['hobbies'][0].as_str()
print(f"Name: {name}, Age: {age}")
print(f"Occupation: {occupation}, First Hobby: {first_hobby}")
# Convert a subtree to a Python object if needed
details_dict = doc['details'].as_dict()
print(f"Details as dict: {details_dict}")
except RuntimeError as e:
print(f"Error during parsing or access: {e}")
# For simple full document loading, compatible with json.loads
from pysimdjson import loads
full_python_obj = loads(json_data)
print(f"Full Python object (loads): {full_python_obj}")