A Fast, spec compliant Python 3.14+ tokenizer

raw JSON →
0.4.1 verified Tue May 12 auth: no python install: verified quickstart: stale

Pytokens is an open-source Python library providing a fast, spec-compliant tokenizer for Python 3.14+ that is also capable of running on older Python versions (>=3.8). Currently at version 0.4.1, it appears to be actively maintained with recent releases focusing on packaging and development improvements.

pip install pytokens
error ModuleNotFoundError: No module named 'pytokens'
cause The `pytokens` library has not been installed in the current Python environment or the environment is not active.
fix
Install the library using pip: pip install pytokens
error TypeError: an integer is required (got type dict)
cause The `pytokens.tokenize` function received an input that was not a string, which is required for the Python source code.
fix
Ensure the input passed to pytokens.tokenize() is always a string containing the Python source code. Example: pytokens.tokenize("def func():\n pass")
error ERROR: No matching distribution found for pytokens
cause Pip could not find a compatible pre-built package for the current Python version (must be >= 3.8) or platform.
fix
Ensure you are using Python 3.8 or newer; if the issue persists, check PyPI for available wheels for your specific environment.
error ModuleNotFoundError: No module named 'pytokens.tokenize'
cause `tokenize` is a function directly available under the `pytokens` package, not a submodule, so it cannot be imported using dot notation as a nested module.
fix
Import the function directly from the package: from pytokens import tokenize or use pytokens.tokenize after import pytokens.
gotcha Pytokens is compiled with mypyc by default for performance. This means the installed module might be a compiled extension (.so or .pyd) rather than pure Python code, which can affect debugging or introspection.
fix To disable mypyc compilation (e.g., for local development or debugging), set the environment variable PYTOKENS_USE_MYPYC=0 before installation (e.g., PYTOKENS_USE_MYPYC=0 pip install pytokens).
gotcha While pytokens aims for Python 3.14+ spec compliance and runs on older Python versions (>=3.8), be aware that its tokenization behavior adheres to the 3.14+ specification. This might introduce subtle differences compared to the native tokenizer behavior of older Python interpreters.
fix Ensure your environment or tests account for the 3.14+ tokenizer specification if running on older Python versions, especially if precise tokenizer behavior is critical for your application.
breaking The script contains shell commands (e.g., `echo`, `>`) which result in a `SyntaxError` when executed by the Python interpreter. This typically occurs when a shell command is mistakenly placed in a Python file or when a shell script is run with the Python interpreter.
fix Ensure that Python scripts only contain valid Python syntax. If shell commands are necessary, they should be executed using Python's `subprocess` module or placed in a separate shell script and executed appropriately.
breaking The script attempted to execute shell commands directly as Python code, leading to a `SyntaxError`. Python interpreters do not understand shell commands like `echo` or shell redirection (`>`).
fix Ensure that shell commands are executed in a shell environment (e.g., by structuring the build/test script to correctly separate shell commands from Python script execution, or by using Python's `subprocess` module with `shell=True` if executing shell commands from Python code). If the file was intended to be a shell script, ensure it is executed by a shell interpreter.
python os / libc status wheel install import disk
3.10 alpine (musl) - - 0.04s 18.5M
3.10 slim (glibc) - - 0.03s 19M
3.11 alpine (musl) - - 0.10s 20.4M
3.11 slim (glibc) - - 0.09s 21M
3.12 alpine (musl) - - 0.10s 12.3M
3.12 slim (glibc) - - 0.09s 13M
3.13 alpine (musl) - - 0.05s 11.9M
3.13 slim (glibc) - - 0.08s 12M
3.9 alpine (musl) - - 0.04s 18.0M
3.9 slim (glibc) - - 0.04s 19M

Pytokens is primarily designed for command-line usage to tokenize Python files. This example demonstrates tokenizing a simple Python file. The output will be a stream of tokens (e.g., NAME, STRING, OP, etc.) representing the input file's content.

echo "print('Hello, World!')" > example.py
python -m pytokens example.py