{"id":8685,"library":"stop-words","title":"stop-words","description":"A Python library providing curated lists of stop words across 34+ languages. Stop words are common words (like “the”, “is”, “at”) that are typically filtered out in natural language processing and text analysis tasks. It offers extensive language support, built-in caching for performance, and zero external dependencies. The current version is 2025.11.4 and it maintains a regular release cadence. [1, 9]","status":"active","version":"2025.11.4","language":"en","source_language":"en","source_url":"https://github.com/Alir3z4/python-stop-words.git","tags":["nlp","text-processing","stop-words","language-processing"],"install":[{"cmd":"pip install stop-words","lang":"bash","label":"Install latest version"}],"dependencies":[],"imports":[{"note":"Primary function to retrieve stop words for a specified language.","symbol":"get_stop_words","correct":"from stop_words import get_stop_words"},{"note":"A safer alternative that handles unsupported languages gracefully (returns empty list).","symbol":"safe_get_stop_words","correct":"from stop_words import safe_get_stop_words"},{"note":"Access the internal cache for advanced control, e.g., clearing or inspecting cached languages.","symbol":"STOP_WORDS_CACHE","correct":"from stop_words import STOP_WORDS_CACHE"}],"quickstart":{"code":"from stop_words import get_stop_words\n\n# Get English stop words\nenglish_stop_words = get_stop_words('en')\nprint(f\"English stop words (first 5): {english_stop_words[:5]}\")\n\n# Get Spanish stop words using full name\nspanish_stop_words = get_stop_words('spanish')\nprint(f\"Spanish stop words (first 5): {spanish_stop_words[:5]}\")\n\n# Example usage in text processing\ntext = \"This is a sample sentence, demonstrating stop word removal.\"\nfiltered_words = [word.lower() for word in text.replace(',', '').replace('.', '').split() if word.lower() not in english_stop_words]\nprint(f\"Filtered text: {' '.join(filtered_words)}\")","lang":"python","description":"Demonstrates how to fetch stop words for English and Spanish and apply them to a simple text string. It highlights the importance of lowercasing and punctuation removal for effective filtering. [1, 9]"},"warnings":[{"fix":"Interact with the returned list (e.g., `my_list = get_stop_words('en'); my_list.append('custom_word')`) or explicitly clear the cache (`from stop_words import STOP_WORDS_CACHE; STOP_WORDS_CACHE.clear()`).","message":"The library caches stop word lists by default for performance. Directly modifying the raw text files (e.g., `english.txt`) within the installed package directory will likely not update the loaded stop words unless the cache is cleared or the application is restarted. [1, 9, 21]","severity":"gotcha","affected_versions":"All versions with caching (>=2015.2.23)"},{"fix":"Always check if the returned list is empty, or use `safe_get_stop_words()` if an empty list is the desired fallback for unsupported languages. Consult documentation for available language codes.","message":"The `get_stop_words()` function returns an empty list if the requested language is not supported or recognized, rather than raising an error. This can lead to silent failures if not explicitly handled or checked. [1, 9]","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always normalize your input text by converting words to lowercase (e.g., `.lower()`) and stripping punctuation before comparing them to the stop word list.","message":"Stop word lists are typically in lowercase. Input text containing capitalized words (e.g., 'The') or words with punctuation (e.g., 'word.') will not match their lowercase, punctuation-free counterparts in the stop word list, leading to them not being filtered. [8]","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Ensure all words in your input text are converted to lowercase (e.g., `word.lower()`) before checking them against the stop words.","cause":"Input text words are not converted to lowercase before comparison with the (lowercase) stop word list. [8]","error":"Words like 'The' or 'Is' are not removed, even when 'the' or 'is' are in the stop word list."},{"fix":"Strip punctuation from words (e.g., using `str.strip(string.punctuation)` or regular expressions) before checking them against the stop word list.","cause":"Punctuation attached to words prevents an exact match with the clean stop words in the list. [8]","error":"Words with trailing punctuation (e.g., 'example.') are not being filtered out by the stop word removal process."},{"fix":"Instead of modifying package files, get the list programmatically and add/remove items (e.g., `my_stop_words = set(get_stop_words('en')) | {'my_new_word'}`). If absolutely necessary, clear the cache using `from stop_words import STOP_WORDS_CACHE; STOP_WORDS_CACHE.clear()`.","cause":"The library employs internal caching, and direct modification of files within the installed package doesn't trigger a cache refresh. This is generally an anti-pattern for package customization. [1, 9, 21]","error":"I modified `english.txt` (or another language file) in the installed package, but `get_stop_words('en')` still returns the old list."},{"fix":"Verify the language code/name against the library's available languages (often listed in the GitHub README or PyPI page). If the language is truly unsupported, you'll need to provide your own stop word list.","cause":"The requested language code or name is either misspelled or genuinely not included in the library's collection. `get_stop_words` silently returns an empty list for unsupported languages. [1, 9]","error":"Calling `get_stop_words('unsupported_lang_code')` returns an empty list, but I expected an error or the language to be supported."}]}