{"id":4718,"library":"pystemmer","title":"PyStemmer","description":"PyStemmer provides efficient access to stemming algorithms from the Snowball project, wrapping the `libstemmer_c` library in a Python module. It's primarily used in information retrieval and search engines to reduce words to their common linguistic base form. The current version is 3.0.0, with an active but irregular release cadence typically driven by updates to the underlying Snowball library or Python compatibility.","status":"active","version":"3.0.0","language":"en","source_language":"en","source_url":"https://github.com/snowballstem/pystemmer/","tags":["stemming","NLP","information retrieval","text processing","Snowball"],"install":[{"cmd":"pip install PyStemmer","lang":"bash","label":"Install PyStemmer"}],"dependencies":[{"reason":"PyStemmer is a wrapper around the C implementation of Snowball stemmers for performance.","package":"libstemmer_c","optional":false},{"reason":"Required for building from source if pre-built wheels are not available.","package":"C compiler","optional":true},{"reason":"Required for building from source if pre-built wheels are not available.","package":"Python header files","optional":true}],"imports":[{"symbol":"Stemmer","correct":"import Stemmer"}],"quickstart":{"code":"import Stemmer\n\n# Get a list of available algorithms\nalgorithms = Stemmer.algorithms()\n# print(algorithms) # Uncomment to see the list\n\n# Get an instance of the English stemmer\nstemmer = Stemmer.Stemmer('english')\n\n# Stem a single word\nword = 'cycling'\nstemmed_word = stemmer.stemWord(word)\nprint(f\"'{word}' stemmed to: '{stemmed_word}'\")\n\n# Stem a list of words\nwords = ['connection', 'connections', 'connective', 'connected', 'connecting']\nstemmed_words = stemmer.stemWords(words)\nprint(f\"Words {words} stemmed to: {stemmed_words}\")","lang":"python","description":"Initialize a stemmer for a specific language and use it to stem single words or lists of words. It's recommended to reuse the stemmer object for performance due to caching."},"warnings":[{"fix":"Create a new `Stemmer.Stemmer()` object for each thread or protect access to a shared stemmer object with a mutex (e.g., `threading.Lock`).","message":"Stemmer objects are not thread-safe if used concurrently by multiple threads. Race conditions can occur.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure you have a C compiler (e.g., GCC, Clang) and Python development headers installed on your system. For Debian/Ubuntu: `sudo apt-get install build-essential python3-dev`.","message":"Installing PyStemmer might fail if pre-built wheels are not available for your system and Python version, as it requires a C compiler and Python development header files to build from source.","severity":"gotcha","affected_versions":"All versions, particularly on less common architectures or specific Python versions"},{"fix":"Upgrade to Python 3.8+ and PyStemmer 3.0.0 or newer. Earlier versions of PyStemmer supporting Python 3 are also available for older Python 3 environments.","message":"Python 2 is no longer actively supported. PyStemmer 2.2.0.1 was the final version tested with Python 2.","severity":"breaking","affected_versions":"< 3.0.0"},{"fix":"Ensure all input text is consistently handled as Unicode strings before passing it to PyStemmer. Explicitly decode byte strings if their encoding is known and not UTF-8.","message":"Input strings are assumed to be Unicode. While `stemWords` can accept UTF-8 encoded byte strings, inconsistencies with other encodings or incorrect handling of Unicode can lead to unexpected stemming results.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-12T00:00:00.000Z","next_check":"2026-07-11T00:00:00.000Z"}