SentencePiece

JSON →
library 0.2.1 ·python
verified Jun 9, 2026 install draft

SentencePiece is an unsupervised text tokenizer and detokenizer, primarily designed for Neural Network-based text generation systems where the vocabulary size is predetermined. It implements subword units like Byte-Pair Encoding (BPE) and Unigram Language Model, capable of training directly from raw sentences without pre-tokenization. The library is actively maintained with regular updates. The current version is 0.2.1.

total hits 20
actors 7 distinct systems
last hit 1h ago ClaudeBot
GPTBot
4
Amazonbot
4
MetaBot
4
Script
2
ClaudeBot
1
Humans
1

top countries 🇺🇸 United States · 🇫🇷 France · 🇨🇦 Canada · 🇩🇪 Germany