SentencePiece
JSON →SentencePiece is an unsupervised text tokenizer and detokenizer, primarily designed for Neural Network-based text generation systems where the vocabulary size is predetermined. It implements subword units like Byte-Pair Encoding (BPE) and Unigram Language Model, capable of training directly from raw sentences without pre-tokenization. The library is actively maintained with regular updates. The current version is 0.2.1.
Traffic · last 30 days
total hits 15
actors 5 distinct systems
last hit 3d ago Amazonbot
top countries 🇺🇸 United States · 🇨🇦 Canada · 🇩🇪 Germany · 🇫🇷 France
API endpoints
full doc /v1/registry/sentencepiece
compatibility /v1/registry/sentencepiece/compatibility