razdel
raw JSON → 0.5.0 verified Fri May 01 auth: no python
Rule-based tokenizer and sentence segmenter for Russian text, splitting into tokens, sentences, and sections. Version 0.5.0 is the latest release, with stable API and infrequent updates.
pip install razdel Common errors
error ModuleNotFoundError: No module named 'razdel' ↓
cause razdel not installed.
fix
pip install razdel
error FileNotFoundError: [Errno 2] No such file or directory: 'razdel/data/...' ↓
cause Missing data files; installation may have been corrupted or incomplete.
fix
Reinstall razdel: pip install --force-reinstall razdel
Warnings
breaking Data files are required but may be missing if installed via pip without proper wheels. Ensure your environment has the data files (razdel/data). ↓
fix Install using pip without flags; verify by running `from razdel import tokenize` and calling it. If error about missing data, reinstall or use the GitHub source.
deprecated The `sectionize` function is experimental and may be removed or changed in future versions. ↓
fix Avoid relying on `sectionize` for production; use `tokenize` and `sentenize` instead.
Imports
- tokenize
from razdel import tokenize - sentenize
from razdel import sentenize
Quickstart
from razdel import tokenize, sentenize
text = "Привет! Как дела?"
tokens = list(tokenize(text))
sentences = list(sentenize(text))
print(tokens)
print(sentences)