{"id":454,"library":"tokenizers","title":"Tokenizers: Fast State-of-the-Art Tokenizers","description":"Tokenizers is a Python library providing fast and versatile tokenization tools, optimized for both research and production environments. The current version is 0.22.2, released on January 5, 2026. The library is actively maintained with regular updates to enhance performance and add features.","status":"active","version":"0.22.2","language":"python","source_language":"en","source_url":"https://github.com/huggingface/tokenizers","tags":["tokenization","NLP","Hugging Face","Python"],"install":[{"cmd":"pip install tokenizers","lang":"bash","label":"Install Tokenizers"}],"dependencies":[{"reason":"Required for building from source; not needed for pip installation","package":"rust","optional":true}],"imports":[{"note":"Ensure correct import path to avoid ImportError","symbol":"Tokenizer","correct":"from tokenizers import Tokenizer"}],"quickstart":{"code":"from tokenizers import Tokenizer\n\n# Load a pretrained tokenizer\ntokenizer = Tokenizer.from_pretrained('bert-base-uncased')\n\n# Tokenize a text\noutput = tokenizer.encode('Hello, world!')\nprint(output.tokens)","lang":"python","description":"A simple example demonstrating how to load a pretrained tokenizer and tokenize a sample text."},"warnings":[{"fix":"Use Python 3.12 or earlier for installation; Python 3.13 is not supported due to PyO3 compatibility issues.","message":"Python 3.13 compatibility issues during installation","severity":"breaking","affected_versions":"0.22.2"},{"fix":"Use 'from tokenizers import Tokenizer' to import the Tokenizer class.","message":"Ensure correct import path to avoid ImportError","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-05-12T13:55:49.322Z","next_check":"2026-06-26T00:00:00.000Z","problems":[{"fix":"pip install tokenizers","cause":"The 'tokenizers' Python package has not been installed in the current environment or the Python interpreter is not using the correct environment where it is installed.","error":"ModuleNotFoundError: No module named 'tokenizers'"},{"fix":"Install Rust (e.g., using `curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh`) and ensure it's in your system's PATH before retrying `pip install tokenizers`.","cause":"The 'tokenizers' library contains Rust components that require a Rust compiler to be installed and accessible in your system's PATH during the pip installation process.","error":"Failed building wheel for tokenizers"},{"fix":"Ensure the specified path points directly to a `tokenizer.json` file or a directory recognized as a valid model by the `tokenizers` library, or use a correct Hugging Face model identifier.","cause":"The path provided to `Tokenizer.from_file()` or `Tokenizer.from_pretrained()` does not point to a valid `tokenizer.json` file or a directory containing a valid tokenizer model.","error":"ValueError: Path /path/to/model is not a valid tokenizers model directory."},{"fix":"Ensure the input to `tokenizer.encode_batch()` is a list of strings, or for single input, ensure `tokenizer.encode()` receives a single string. Example: `tokenizer.encode_batch(['text1', 'text2'])`.","cause":"The `encode_batch` method (or `encode` when used with multiple inputs) of the `tokenizers.Tokenizer` object received an input that was not a list of strings, or the list contained non-string elements.","error":"TypeError: Expected a list of strings, received ..."},{"fix":"Access the token IDs via `encoding.ids`, attention mask via `encoding.attention_mask`, and token type IDs via `encoding.type_ids`. For example, `input_ids = encoding.ids`.","cause":"When using the `tokenizers` library directly, the `encode()` method returns an `Encoding` object, which has different attribute names (e.g., `ids`, `attention_mask`, `type_ids`) compared to the `transformers.BatchEncoding` object.","error":"AttributeError: 'Encoding' object has no attribute 'input_ids'"}],"ecosystem":"pypi","meta_description":null,"install_score":100,"install_tag":"verified","quickstart_score":80,"quickstart_tag":"verified","pypi_latest":null,"install_checks":{"last_tested":"2026-05-12","tag":"verified","tag_description":"installs cleanly on critical runtimes, fast import, recently tested","results":[{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":0.02,"mem_mb":0.9,"disk_size":"85.0M"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":0.01,"mem_mb":0.9,"disk_size":"68M"},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":0.04,"mem_mb":1.3,"disk_size":"90.3M"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":0.03,"mem_mb":1.3,"disk_size":"73M"},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":0.03,"mem_mb":1,"disk_size":"81.4M"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":0.04,"mem_mb":1,"disk_size":"64M"},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":0.03,"mem_mb":1.1,"disk_size":"81.0M"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":0.03,"mem_mb":0.9,"disk_size":"64M"},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":0.02,"mem_mb":0.8,"disk_size":"83.8M"},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":0.02,"mem_mb":0.8,"disk_size":"67M"}]},"quickstart_checks":{"last_tested":"2026-04-23","tag":"verified","tag_description":"quickstart runs on critical runtimes, recently tested","results":[{"runtime":"python:3.10-alpine","exit_code":0},{"runtime":"python:3.10-slim","exit_code":0},{"runtime":"python:3.11-alpine","exit_code":0},{"runtime":"python:3.11-slim","exit_code":0},{"runtime":"python:3.12-alpine","exit_code":0},{"runtime":"python:3.12-slim","exit_code":0},{"runtime":"python:3.13-alpine","exit_code":0},{"runtime":"python:3.13-slim","exit_code":0},{"runtime":"python:3.9-alpine","exit_code":0},{"runtime":"python:3.9-slim","exit_code":0}]}}