Tokenizers: Fast State-of-the-Art Tokenizers

0.22.2 · active · verified Sat Mar 28

Tokenizers is a Python library providing fast and versatile tokenization tools, optimized for both research and production environments. The current version is 0.22.2, released on January 5, 2026. The library is actively maintained with regular updates to enhance performance and add features.

Warnings

Install

Imports

Quickstart

A simple example demonstrating how to load a pretrained tokenizer and tokenize a sample text.

from tokenizers import Tokenizer

# Load a pretrained tokenizer
tokenizer = Tokenizer.from_pretrained('bert-base-uncased')

# Tokenize a text
output = tokenizer.encode('Hello, world!')
print(output.tokens)

view raw JSON →