razdel

raw JSON →
0.5.0 verified Fri May 01 auth: no python

Rule-based tokenizer and sentence segmenter for Russian text, splitting into tokens, sentences, and sections. Version 0.5.0 is the latest release, with stable API and infrequent updates.

pip install razdel
error ModuleNotFoundError: No module named 'razdel'
cause razdel not installed.
fix
pip install razdel
error FileNotFoundError: [Errno 2] No such file or directory: 'razdel/data/...'
cause Missing data files; installation may have been corrupted or incomplete.
fix
Reinstall razdel: pip install --force-reinstall razdel
breaking Data files are required but may be missing if installed via pip without proper wheels. Ensure your environment has the data files (razdel/data).
fix Install using pip without flags; verify by running `from razdel import tokenize` and calling it. If error about missing data, reinstall or use the GitHub source.
deprecated The `sectionize` function is experimental and may be removed or changed in future versions.
fix Avoid relying on `sectionize` for production; use `tokenize` and `sentenize` instead.

Basic usage: tokenize and sentenize a short Russian text.

from razdel import tokenize, sentenize

text = "Привет! Как дела?"
tokens = list(tokenize(text))
sentences = list(sentenize(text))
print(tokens)
print(sentences)