Snowball Stemmer Python Library

3.0.1 · active · verified Sun Mar 29

This package provides 32 stemmers for 30 languages, generated from the widely-used Snowball algorithms. It is a pure Python implementation, often employed in information retrieval and text processing pipelines for word normalization. Currently at version 3.0.1, the library is actively maintained, providing a lightweight and fast solution for reducing words to their base forms.

Warnings

Install

Imports

Quickstart

This example demonstrates how to initialize an English stemmer and use it to stem individual words and lists of words. It also shows how to retrieve the list of supported stemming algorithms.

import snowballstemmer

algorithms = snowballstemmer.algorithms()
# print(f"Available stemmers: {', '.join(algorithms)}")

stemmer = snowballstemmer.stemmer('english')
words = ['running', 'runs', 'ran', 'runner', 'unnecessary']
stems = [stemmer.stemWord(word) for word in words]
print(f"Words: {words}")
print(f"Stems: {stems}")

sentence_words = "We are running in the fields and watching runners run.".lower().split()
sentence_stems = stemmer.stemWords(sentence_words)
print(f"Sentence words: {sentence_words}")
print(f"Sentence stems: {sentence_stems}")

view raw JSON →