untokenize
The `untokenize` library (version 0.1.1) transforms a stream of Python tokens back into source code. Its primary distinction from the standard library's `tokenize.untokenize()` is its ability to preserve the original whitespace between tokens, addressing a common limitation. Released in February 2014, this library is stable for its intended purpose, but is no longer under active development.
Warnings
- gotcha The `untokenize` library's last release was in February 2014, indicating it is no longer actively maintained. While its core functionality is simple and stable, it will not receive updates for new Python syntax features or potential edge cases that arise with future Python versions.
- gotcha This library provides an `untokenize` function that is distinct from Python's standard `tokenize.untokenize`. The key difference is that this library prioritizes the preservation of original whitespace between tokens, which the standard library function does not reliably do, potentially leading to different formatting in the output.
Install
-
pip install untokenize
Imports
- untokenize
import untokenize
Quickstart
import tokenize
import io
import untokenize
source_code_with_whitespace = "def hello( name ):\n print(f'Hello, {name}!') # A comment"
# Tokenize the source code using the standard library's generator
# The readline callable should return bytes for tokenize.tokenize
tokens_generator = tokenize.tokenize(io.BytesIO(source_code_with_whitespace.encode('utf-8')).readline)
# Convert generator to a list of tokens
tokens = list(tokens_generator)
# Use untokenize library to reconstruct source code, preserving whitespace
reconstructed_code = untokenize.untokenize(tokens)
print("Original:")
print(source_code_with_whitespace)
print("\nReconstructed (with untokenize library):")
print(reconstructed_code)
# For comparison, standard library's untokenize (might format differently)
# from tokenize import untokenize as std_untokenize
# std_reconstructed_code = std_untokenize(tokens)
# print("\nReconstructed (with stdlib tokenize.untokenize):")
# print(std_reconstructed_code)