Rust Stemmers for Python
py-rust-stemmers (version 0.1.5) is a high-performance Python wrapper around the Rust `rust-stemmers` library. It implements the Snowball stemming algorithm, offering efficient word stemming for multiple languages with support for parallel processing, making it a powerful tool for text processing tasks. The library is actively maintained, with its latest version uploaded to PyPI in February 2025 and continued development activity on GitHub through late 2025.
Warnings
- gotcha When installing from source or in environments like Docker/CI without pre-built wheels, `py-rust-stemmers` requires the Rust toolchain and `maturin` to be present for compilation. This can add complexity to build pipelines.
- gotcha Snowball stemming algorithms aim to reduce words to a common root form, which may not always be a dictionary word or a true lemma. If your application requires actual dictionary words or full lemmatization, Snowball stemmers (and thus `py-rust-stemmers`) may not be the appropriate solution.
- gotcha Errors originating from the underlying Rust code that result in a `panic!` in Rust will typically manifest as a `pyo3_runtime.PanicException` in Python. This might be less specific than expected Python exceptions, making granular error handling challenging without explicit Rust-to-Python exception mapping.
Install
-
pip install py-rust-stemmers
Imports
- SnowballStemmer
from py_rust_stemmers import SnowballStemmer
Quickstart
from py_rust_stemmers import SnowballStemmer
# Initialize the stemmer for the English language
s = SnowballStemmer('english')
text = """This stem form is often a word itself, but this is not always the case as this is not a requirement for text search systems, which are the intended field of use. We also aim to conflate words with the same meaning, rather than all words with a common linguistic root (so awe and awful don't have the same stem), and over-stemming is more problematic than under-stemming so we tend not to stem in cases that are hard to resolve. If you want to always reduce words to a root form and/or get a root form which is itself a word then Snowball's stemming algorithms likely aren't the right answer."""
words = text.split()
# Stem a single word
stemmed_word = s.stem_word(words[0])
print(f"Stemmed word: {stemmed_word}")
# Stem a list of words
stemmed_words = s.stem_words(words)
print(f"Stemmed words: {stemmed_words}")
# Stem words in parallel (for larger text sequences)
stemmed_words_parallel = s.stem_words_parallel(words)
print(f"Stemmed words (parallel): {stemmed_words_parallel}")