Curated Tokenizers

JSON →
library 2.0.0 ·python
verified May 22, 2026

Curated Tokenizers is a lightweight Python library by Explosion (creators of spaCy) that provides efficient and production-ready implementations of various piece tokenization algorithms, including Byte-Pair Encoding (BPE), WordPiece, and SentencePiece. It focuses on fast, reliable tokenization suitable for integrating into larger NLP pipelines. The library is currently at version 2.0.0, with an active but less frequent release cadence focused on performance and stability.

total hits 24
actors 7 distinct systems
last hit 1d ago AhrefsBot
ByteDance
12
GPTBot
2
Script
2
Search engines
2
Humans
1

top countries 🇸🇬 Singapore · 🇺🇸 United States · 🇫🇷 France · 🇩🇪 Germany · 🇨🇦 Canada