SentencePiece

JSON →
library 0.2.1 ·python
verified Jun 9, 2026 install draft

SentencePiece is an unsupervised text tokenizer and detokenizer, primarily designed for Neural Network-based text generation systems where the vocabulary size is predetermined. It implements subword units like Byte-Pair Encoding (BPE) and Unigram Language Model, capable of training directly from raw sentences without pre-tokenization. The library is actively maintained with regular updates. The current version is 0.2.1.

total hits 15
actors 5 distinct systems
last hit 3d ago Amazonbot
Amazonbot
4
MetaBot
4
Script
2
Humans
1

top countries 🇺🇸 United States · 🇨🇦 Canada · 🇩🇪 Germany · 🇫🇷 France