stop-words
A Python library providing curated lists of stop words across 34+ languages. Stop words are common words (like “the”, “is”, “at”) that are typically filtered out in natural language processing and text analysis tasks. It offers extensive language support, built-in caching for performance, and zero external dependencies. The current version is 2025.11.4 and it maintains a regular release cadence. [1, 9]
Common errors
-
Words like 'The' or 'Is' are not removed, even when 'the' or 'is' are in the stop word list.
cause Input text words are not converted to lowercase before comparison with the (lowercase) stop word list. [8]fixEnsure all words in your input text are converted to lowercase (e.g., `word.lower()`) before checking them against the stop words. -
Words with trailing punctuation (e.g., 'example.') are not being filtered out by the stop word removal process.
cause Punctuation attached to words prevents an exact match with the clean stop words in the list. [8]fixStrip punctuation from words (e.g., using `str.strip(string.punctuation)` or regular expressions) before checking them against the stop word list. -
I modified `english.txt` (or another language file) in the installed package, but `get_stop_words('en')` still returns the old list.cause The library employs internal caching, and direct modification of files within the installed package doesn't trigger a cache refresh. This is generally an anti-pattern for package customization. [1, 9, 21]fixInstead of modifying package files, get the list programmatically and add/remove items (e.g., `my_stop_words = set(get_stop_words('en')) | {'my_new_word'}`). If absolutely necessary, clear the cache using `from stop_words import STOP_WORDS_CACHE; STOP_WORDS_CACHE.clear()`. -
Calling `get_stop_words('unsupported_lang_code')` returns an empty list, but I expected an error or the language to be supported.cause The requested language code or name is either misspelled or genuinely not included in the library's collection. `get_stop_words` silently returns an empty list for unsupported languages. [1, 9]fixVerify the language code/name against the library's available languages (often listed in the GitHub README or PyPI page). If the language is truly unsupported, you'll need to provide your own stop word list.
Warnings
- gotcha The library caches stop word lists by default for performance. Directly modifying the raw text files (e.g., `english.txt`) within the installed package directory will likely not update the loaded stop words unless the cache is cleared or the application is restarted. [1, 9, 21]
- gotcha The `get_stop_words()` function returns an empty list if the requested language is not supported or recognized, rather than raising an error. This can lead to silent failures if not explicitly handled or checked. [1, 9]
- gotcha Stop word lists are typically in lowercase. Input text containing capitalized words (e.g., 'The') or words with punctuation (e.g., 'word.') will not match their lowercase, punctuation-free counterparts in the stop word list, leading to them not being filtered. [8]
Install
-
pip install stop-words
Imports
- get_stop_words
from stop_words import get_stop_words
- safe_get_stop_words
from stop_words import safe_get_stop_words
- STOP_WORDS_CACHE
from stop_words import STOP_WORDS_CACHE
Quickstart
from stop_words import get_stop_words
# Get English stop words
english_stop_words = get_stop_words('en')
print(f"English stop words (first 5): {english_stop_words[:5]}")
# Get Spanish stop words using full name
spanish_stop_words = get_stop_words('spanish')
print(f"Spanish stop words (first 5): {spanish_stop_words[:5]}")
# Example usage in text processing
text = "This is a sample sentence, demonstrating stop word removal."
filtered_words = [word.lower() for word in text.replace(',', '').replace('.', '').split() if word.lower() not in english_stop_words]
print(f"Filtered text: {' '.join(filtered_words)}")