tokenize-rt
tokenize-rt is a Python library that provides a wrapper around the standard library's `tokenize` module, ensuring proper round-tripping of Python source code. It extends the standard token set with `ESCAPED_NL` and `UNIMPORTANT_WS` tokens, making it especially useful for refactoring tools that need to preserve whitespace and exact source representation. The library is actively maintained, with version 6.2.0 released on May 23, 2025, and generally follows a regular release cadence.
Warnings
- gotcha tokenize-rt intentionally introduces additional token types (`ESCAPED_NL`, `UNIMPORTANT_WS`) and normalizes certain aspects (e.g., string prefixes, Python 2 literals in Python 3). This means its token stream will differ from that produced by the standard library's `tokenize` module, and direct comparisons or expectations of identical token streams should be adjusted.
- gotcha When reading Python source from a file, `src_to_tokens` (similar to `stdlib.tokenize.tokenize`) expects a callable `readline` function that returns lines as *bytes*, not a file object directly. If you open a file, ensure it's in binary read mode (`'rb'`) and pass `file_object.readline` to `src_to_tokens`.
- breaking The underlying `stdlib.tokenize` module introduced a breaking change in tokenization of f-strings between Python 3.11 and Python 3.12 due to the formalization of PEP 701. While `tokenize-rt` aims to roundtrip reliably, tools built on fine-grained f-string token introspection might need careful review if moving between these Python versions.
Install
-
pip install tokenize-rt
Imports
- src_to_tokens
from tokenize_rt import src_to_tokens
- tokens_to_src
from tokenize_rt import tokens_to_src
- Token
from tokenize_rt import Token
- ESCAPED_NL
from tokenize_rt import ESCAPED_NL
- UNIMPORTANT_WS
from tokenize_rt import UNIMPORTANT_WS
Quickstart
from tokenize_rt import src_to_tokens, tokens_to_src
def roundtrip_code(code: str) -> str:
tokens = src_to_tokens(code)
# You can now inspect or modify 'tokens'
# For example, let's print them
for token in tokens:
print(f"Token(name={token.name!r}, src={token.src!r}, line={token.line}, offset={token.utf8_byte_offset})")
# Convert back to source
return tokens_to_src(tokens)
example_code = 'def foo(bar):
if bar: # a comment
return "hello world"
'
roundtripped_code = roundtrip_code(example_code)
print("\nOriginal Code:\n", example_code)
print("\nRoundtripped Code:\n", roundtripped_code)
assert example_code == roundtripped_code