Tokenizers: Fast State-of-the-Art Tokenizers
Tokenizers is a Python library providing fast and versatile tokenization tools, optimized for both research and production environments. The current version is 0.22.2, released on January 5, 2026. The library is actively maintained with regular updates to enhance performance and add features.
Warnings
- breaking Python 3.13 compatibility issues during installation
- gotcha Ensure correct import path to avoid ImportError
Install
-
pip install tokenizers
Imports
- Tokenizer
from tokenizers import Tokenizer
Quickstart
from tokenizers import Tokenizer
# Load a pretrained tokenizer
tokenizer = Tokenizer.from_pretrained('bert-base-uncased')
# Tokenize a text
output = tokenizer.encode('Hello, world!')
print(output.tokens)