Snowball Stemmer Python Library
raw JSON → 3.0.1 verified Tue May 12 auth: no python install: verified
This package provides 32 stemmers for 30 languages, generated from the widely-used Snowball algorithms. It is a pure Python implementation, often employed in information retrieval and text processing pipelines for word normalization. Currently at version 3.0.1, the library is actively maintained, providing a lightweight and fast solution for reducing words to their base forms.
pip install snowballstemmer Common errors
error ModuleNotFoundError: No module named 'snowballstemmer' ↓
cause The `snowballstemmer` package is not installed in the current Python environment.
fix
pip install snowballstemmer
error AttributeError: 'Stemmer' object has no attribute 'stem' ↓
cause Users often confuse the `stemWord` or `stemWords` methods of `snowballstemmer` with a `stem` method found in other stemming libraries like NLTK.
fix
Use
stemmer.stemWord('word') for a single word or stemmer.stemWords(['word1', 'word2']) for a list of words. error TypeError: Stemmer.__init__() missing 1 required positional argument: 'stemmers' ↓
cause The `Stemmer` class is being instantiated directly with a language name string, but its constructor expects an internal list of stemmer objects; the `snowballstemmer.stemmer()` factory function should be used instead.
fix
import snowballstemmer; my_stemmer = snowballstemmer.stemmer('english')
error TypeError: expected string, list found ↓
cause The `stemWord` method is designed to process a single string argument, but it received a list of words.
fix
Use
stemmer.stemWords(['word1', 'word2']) for processing a list of words, or iterate and call stemWord for each string. Warnings
gotcha Snowball stemmers are designed for information retrieval, not linguistic correctness. The generated 'stem' is often not a dictionary word or a true lemma. Expecting a grammatically correct root form is a common misconception. ↓
fix Understand that the output is a base form for conflation, not necessarily a dictionary entry. If true lemmas are needed, consider a lemmatization library (e.g., NLTK with WordNet).
gotcha Applying the wrong language rules is a common mistake. Each stemmer is language-specific. Using an English stemmer on non-English text, or vice-versa, will yield incorrect results. ↓
fix Explicitly select the appropriate stemmer for the language of your text (e.g., `snowballstemmer.stemmer('german')`). Implement language detection if processing multilingual content.
gotcha Stemming can lead to over-stemming (stripping too much, grouping unrelated words) or under-stemming (not stripping enough, failing to group related words) due to its rule-based nature. ↓
fix Evaluate the stemming output on representative data and understand its limitations. For higher precision, consider hybrid approaches or lemmatization, especially for irregular forms.
gotcha A `Stemmer` object is not thread-safe if the same object is used concurrently by multiple threads. This can lead to unexpected behavior in concurrent applications. ↓
fix For concurrent stemming in different threads, create a separate `Stemmer` object for each thread. Creating stemmer objects has some cost, but they are re-entrant.
gotcha For performance-critical applications, the pure Python `snowballstemmer` can be slower than C-based implementations. A significant speedup can be achieved by installing `PyStemmer`. ↓
fix Install `PyStemmer` (e.g., `pip install PyStemmer`). The `snowballstemmer` library will automatically detect and utilize `PyStemmer` for faster processing if it's available.
Install compatibility verified last tested: 2026-05-12
python os / libc status wheel install import disk
3.10 alpine (musl) wheel - 0.27s 19.3M
3.10 alpine (musl) - - 0.18s 19.3M
3.10 slim (glibc) wheel 1.6s 0.30s 20M
3.10 slim (glibc) - - 0.36s 20M
3.11 alpine (musl) wheel - 0.43s 21.6M
3.11 alpine (musl) - - 0.43s 21.6M
3.11 slim (glibc) wheel 1.7s 0.40s 22M
3.11 slim (glibc) - - 0.35s 22M
3.12 alpine (musl) wheel - 0.32s 13.4M
3.12 alpine (musl) - - 0.53s 13.4M
3.12 slim (glibc) wheel 1.6s 0.36s 14M
3.12 slim (glibc) - - 0.41s 14M
3.13 alpine (musl) wheel - 0.26s 13.2M
3.13 alpine (musl) - - 0.27s 13.1M
3.13 slim (glibc) wheel 1.6s 0.27s 14M
3.13 slim (glibc) - - 0.29s 14M
3.9 alpine (musl) wheel - 0.05s 18.7M
3.9 alpine (musl) - - 0.05s 18.7M
3.9 slim (glibc) wheel 1.8s 0.04s 19M
3.9 slim (glibc) - - 0.05s 19M
Imports
- stemmer wrong
from snowballstemmer import Stemmer (incorrect class name and module structure)correctimport snowballstemmer stemmer_obj = snowballstemmer.stemmer('english')
Quickstart last tested: 2026-04-24
import snowballstemmer
algorithms = snowballstemmer.algorithms()
# print(f"Available stemmers: {', '.join(algorithms)}")
stemmer = snowballstemmer.stemmer('english')
words = ['running', 'runs', 'ran', 'runner', 'unnecessary']
stems = [stemmer.stemWord(word) for word in words]
print(f"Words: {words}")
print(f"Stems: {stems}")
sentence_words = "We are running in the fields and watching runners run.".lower().split()
sentence_stems = stemmer.stemWords(sentence_words)
print(f"Sentence words: {sentence_words}")
print(f"Sentence stems: {sentence_stems}")