Byte-Pair Embeddings (BPEmb)

library 0.3.6 ·python

✓ verified May 26, 2026

BPEmb provides byte-pair encodings (BPE) from raw text and maps subword units to pre-trained embeddings for 275 languages. It's designed for NLP tasks requiring efficient subword tokenization and embedding. The current version is 0.3.6, with releases occurring sporadically based on updates to models or features.

Traffic · last 30 days ↑17% vs prev 7d · indexed Fri Apr 17 · updated Mon Jun 01

total hits 22

actors 8 distinct systems

last hit 2d ago AhrefsBot

ByteDance

MetaBot

GPTBot

Script

Search engines

top countries 🇩🇪 Germany · 🇺🇸 United States · 🇸🇬 Singapore · 🇨🇦 Canada · 🇫🇷 France

Resources

packagepypi.org/project/bpemb/ ↗

homepagenlp.h-its.org/bpemb ↗

API endpoints

full doc /v1/registry/bpemb

install /v1/registry/bpemb/install

compatibility /v1/registry/bpemb/compatibility