RustBPE Tokenizer

JSON →
library 0.1.0 ·python
verified May 24, 2026

RustBPE is a Python library that provides a fast Byte Pair Encoding (BPE) tokenizer implemented in Rust, with Python bindings. It is designed primarily for training GPT-style BPE tokenizers and offers features like parallel processing, GPT-4 style regex pre-tokenization, and direct export to the tiktoken format for efficient inference. Currently at version 0.1.0, it is an initial release, suggesting active and potentially rapid development.

total hits 26
actors 11 distinct systems
last hit 19h ago ByteDance
GPTBot
6
OAI-SearchBot
4
MetaBot
4
Script
2
ByteDance
2
ClaudeBot
1
ChatGPT-User
1
Search engines
1

top countries 🇺🇸 United States · 🇨🇦 Canada · 🇩🇪 Germany · 🇸🇬 Singapore · 🇯🇵 Japan