{"id":4062,"library":"json-stream-rs-tokenizer","title":"JSON Stream Rust Tokenizer","description":"A faster tokenizer for the `json-stream` Python library, this package ports `json-stream`'s internal tokenizer to Rust using PyO3, offering a significant parsing speedup (4-10x on CPython). As of `json-stream` version 2.0+, it is automatically detected and used by default, making explicit installation or usage generally unnecessary. It targets Python versions <3.15,>=3.8 and is actively maintained.","status":"active","version":"0.5.1","language":"en","source_language":"en","source_url":"https://github.com/smheidrich/py-json-stream-rs-tokenizer","tags":["json","streaming","tokenizer","rust","performance","json-stream"],"install":[{"cmd":"pip install json-stream-rs-tokenizer","lang":"bash","label":"Install latest version"}],"dependencies":[{"reason":"This library provides a faster tokenizer specifically for `json-stream`. `json-stream` v2.0+ automatically uses it if installed.","package":"json-stream","optional":false}],"imports":[{"note":"The Rust tokenizer is exposed directly from `json_stream_rs_tokenizer`, not an internal `json_stream` submodule.","wrong":"from json_stream.tokenizer import RustTokenizer","symbol":"RustTokenizer","correct":"from json_stream_rs_tokenizer import RustTokenizer"},{"note":"This provides a convenience wrapper around `json_stream.load` that pre-configures the Rust tokenizer.","symbol":"load","correct":"from json_stream_rs_tokenizer import load"}],"quickstart":{"code":"from io import StringIO\nfrom json_stream import load as json_stream_load\nfrom json_stream_rs_tokenizer import RustTokenizer\n\n# Example JSON data\njson_buf = StringIO('{ \"a\": [1,2,3,4], \"b\": [5,6,7] }')\n\n# Explicitly use the Rust tokenizer with json_stream.load\nd = json_stream_load(json_buf, tokenizer=RustTokenizer)\n\nprint(f\"Keys: {list(d.keys())}\") # Output: Keys: ['a', 'b']\nfor k, l in d.items():\n    print(f\"{k}: {' '.join(str(n) for n in l)}\")\n# Expected output for 'a': a: 1 2 3 4\n\n# Alternatively, use the convenience wrapper from json_stream_rs_tokenizer\njson_buf_alt = StringIO('{ \"x\": \"hello\", \"y\": \"world\" }')\nfrom json_stream_rs_tokenizer import load as rs_load\ndata_alt = rs_load(json_buf_alt)\nprint(f\"Value of x: {data_alt['x']}\") # Output: Value of x: hello\n","lang":"python","description":"This quickstart demonstrates how to explicitly use the `RustTokenizer` with `json-stream`'s `load` function and how to use the convenience `load` wrapper provided by this package."},"warnings":[{"fix":"Ensure `json-stream` is updated. If you are using `json-stream` 2.0+, simply installing `json-stream-rs-tokenizer` is sufficient for it to be used implicitly.","message":"Starting with `json-stream` version 2.0, this tokenizer is used automatically if available. Explicitly installing `json-stream-rs-tokenizer` or passing `RustTokenizer` to `json_stream.load()` is usually unnecessary unless you are on an older `json-stream` version or need a specific configuration.","severity":"gotcha","affected_versions":"json-stream >= 2.0"},{"fix":"Ensure a Rust toolchain is installed and accessible in your environment (`curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh`). Increase pip verbosity (`pip install -vv json-stream-rs-tokenizer`) to see detailed build logs and diagnose Rust compilation errors.","message":"Installation from source requires a Rust toolchain. If a prebuilt wheel is not available for your platform and Python version, the installation process will attempt to build from source. A failed Rust build might still report a successful Python package installation, but `RustTokenizer` will not be importable.","severity":"breaking","affected_versions":"All versions (when installing from source)"},{"fix":"To ensure release mode compilation, verify installation commands use `--release`. Check `pip install -v` output for Rust compilation flags.","message":"When installed in editable/development mode, the Rust library might be compiled in debug mode, which can make it *slower* than the pure-Python tokenizer.","severity":"gotcha","affected_versions":"All versions (when installing in debug/development mode)"},{"fix":"Be aware of this limitation if deploying on PyPy; the benefits might not be as pronounced.","message":"For PyPy, the performance improvement from `json-stream-rs-tokenizer` is significantly lower (1.0-1.5x) compared to CPython (4-10x).","severity":"gotcha","affected_versions":"All versions (on PyPy)"},{"fix":"Avoid `correct_cursor=True` for un-seekable streams unless absolutely necessary for mixed data parsing. Consider refactoring data formats or using seekable streams if possible.","message":"When parsing mixed data with `json-stream.load(..., correct_cursor=True)`, keeping track of the exact stream position for un-seekable streams incurs a significant performance cost.","severity":"gotcha","affected_versions":"All versions (when using `correct_cursor=True` with un-seekable streams)"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}