untokenize

0.1.1 · maintenance · verified Sun Apr 12

The `untokenize` library (version 0.1.1) transforms a stream of Python tokens back into source code. Its primary distinction from the standard library's `tokenize.untokenize()` is its ability to preserve the original whitespace between tokens, addressing a common limitation. Released in February 2014, this library is stable for its intended purpose, but is no longer under active development.

Warnings

Install

Imports

Quickstart

This example demonstrates how to use `untokenize` to reconstruct source code from a token stream, ensuring original whitespace is preserved. It uses `tokenize.tokenize` from the standard library to generate tokens, then passes them to `untokenize.untokenize`.

import tokenize
import io
import untokenize

source_code_with_whitespace = "def hello(  name ):\n    print(f'Hello, {name}!') # A comment"

# Tokenize the source code using the standard library's generator
# The readline callable should return bytes for tokenize.tokenize
tokens_generator = tokenize.tokenize(io.BytesIO(source_code_with_whitespace.encode('utf-8')).readline)

# Convert generator to a list of tokens
tokens = list(tokens_generator)

# Use untokenize library to reconstruct source code, preserving whitespace
reconstructed_code = untokenize.untokenize(tokens)

print("Original:")
print(source_code_with_whitespace)
print("\nReconstructed (with untokenize library):")
print(reconstructed_code)

# For comparison, standard library's untokenize (might format differently)
# from tokenize import untokenize as std_untokenize
# std_reconstructed_code = std_untokenize(tokens)
# print("\nReconstructed (with stdlib tokenize.untokenize):")
# print(std_reconstructed_code)

view raw JSON →