Curated Tokenizers

library 2.0.0 ·python

✓ verified May 22, 2026

Curated Tokenizers is a lightweight Python library by Explosion (creators of spaCy) that provides efficient and production-ready implementations of various piece tokenization algorithms, including Byte-Pair Encoding (BPE), WordPiece, and SentencePiece. It focuses on fast, reliable tokenization suitable for integrating into larger NLP pipelines. The library is currently at version 2.0.0, with an active but less frequent release cadence focused on performance and stability.

Traffic · last 30 days ↓64% vs prev 7d · indexed Tue Apr 14 · updated Fri May 29

total hits 24

actors 7 distinct systems

last hit 1d ago AhrefsBot

ByteDance

GPTBot

Script

Search engines

Humans

top countries 🇸🇬 Singapore · 🇺🇸 United States · 🇫🇷 France · 🇩🇪 Germany · 🇨🇦 Canada

Resources

githubgithub.com/explosion/curated-tokenizers ↗

packagepypi.org/project/curated-tokenizers/ ↗

API endpoints

full doc /v1/registry/curated-tokenizers

install /v1/registry/curated-tokenizers/install

compatibility /v1/registry/curated-tokenizers/compatibility