Curated Tokenizers
JSON →Curated Tokenizers is a lightweight Python library by Explosion (creators of spaCy) that provides efficient and production-ready implementations of various piece tokenization algorithms, including Byte-Pair Encoding (BPE), WordPiece, and SentencePiece. It focuses on fast, reliable tokenization suitable for integrating into larger NLP pipelines. The library is currently at version 2.0.0, with an active but less frequent release cadence focused on performance and stability.
Traffic · last 30 days ↓64% vs prev 7d
total hits 24
actors 7 distinct systems
last hit 1d ago AhrefsBot
top countries 🇸🇬 Singapore · 🇺🇸 United States · 🇫🇷 France · 🇩🇪 Germany · 🇨🇦 Canada
Resources
API endpoints
full doc /v1/registry/curated-tokenizers
compatibility /v1/registry/curated-tokenizers/compatibility