{"id":2673,"library":"py-rust-stemmers","title":"Rust Stemmers for Python","description":"py-rust-stemmers (version 0.1.5) is a high-performance Python wrapper around the Rust `rust-stemmers` library. It implements the Snowball stemming algorithm, offering efficient word stemming for multiple languages with support for parallel processing, making it a powerful tool for text processing tasks. The library is actively maintained, with its latest version uploaded to PyPI in February 2025 and continued development activity on GitHub through late 2025.","status":"active","version":"0.1.5","language":"en","source_language":"en","source_url":"https://github.com/qdrant/py-rust-stemmers","tags":["stemming","nlp","rust","performance","text-processing","snowball"],"install":[{"cmd":"pip install py-rust-stemmers","lang":"bash","label":"Install latest version"}],"dependencies":[],"imports":[{"symbol":"SnowballStemmer","correct":"from py_rust_stemmers import SnowballStemmer"}],"quickstart":{"code":"from py_rust_stemmers import SnowballStemmer\n\n# Initialize the stemmer for the English language\ns = SnowballStemmer('english')\n\ntext = \"\"\"This stem form is often a word itself, but this is not always the case as this is not a requirement for text search systems, which are the intended field of use. We also aim to conflate words with the same meaning, rather than all words with a common linguistic root (so awe and awful don't have the same stem), and over-stemming is more problematic than under-stemming so we tend not to stem in cases that are hard to resolve. If you want to always reduce words to a root form and/or get a root form which is itself a word then Snowball's stemming algorithms likely aren't the right answer.\"\"\"\nwords = text.split()\n\n# Stem a single word\nstemmed_word = s.stem_word(words[0])\nprint(f\"Stemmed word: {stemmed_word}\")\n\n# Stem a list of words\nstemmed_words = s.stem_words(words)\nprint(f\"Stemmed words: {stemmed_words}\")\n\n# Stem words in parallel (for larger text sequences)\nstemmed_words_parallel = s.stem_words_parallel(words)\nprint(f\"Stemmed words (parallel): {stemmed_words_parallel}\")","lang":"python","description":"Initialize a `SnowballStemmer` for a specific language and then use `stem_word`, `stem_words`, or `stem_words_parallel` for single word, batch, or parallel stemming, respectively."},"warnings":[{"fix":"Ensure Rust and `maturin` are installed in your build environment, or rely on pre-built wheels where available. For example: `curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh` followed by `pip install maturin`.","message":"When installing from source or in environments like Docker/CI without pre-built wheels, `py-rust-stemmers` requires the Rust toolchain and `maturin` to be present for compilation. This can add complexity to build pipelines.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Understand the distinction between stemming and lemmatization. If full lemmatization is needed, consider libraries like spaCy or NLTK's WordNetLemmatizer, which use linguistic knowledge bases.","message":"Snowball stemming algorithms aim to reduce words to a common root form, which may not always be a dictionary word or a true lemma. If your application requires actual dictionary words or full lemmatization, Snowball stemmers (and thus `py-rust-stemmers`) may not be the appropriate solution.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Implement robust error handling in Rust (using `Result` types and mapping to specific `PyErr` types like `PyValueError`) if more precise Python exceptions are required. Handle `PanicException` as a general fallback for unrecoverable Rust errors.","message":"Errors originating from the underlying Rust code that result in a `panic!` in Rust will typically manifest as a `pyo3_runtime.PanicException` in Python. This might be less specific than expected Python exceptions, making granular error handling challenging without explicit Rust-to-Python exception mapping.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-10T00:00:00.000Z","next_check":"2026-07-09T00:00:00.000Z"}