Stanza
Stanza, by the Stanford NLP Group, is a Python NLP library supporting over 70 human languages. It offers a fully neural pipeline for various text analysis tasks, including tokenization, multi-word token expansion, lemmatization, part-of-speech and morphological feature tagging, dependency parsing, and named entity recognition. Stanza also provides a stable Python interface to the Java Stanford CoreNLP Toolkit. Actively maintained, it receives regular updates, with the current version being 1.11.1.
Warnings
- breaking As of v1.11.1, Stanza's default model download location has changed from `~/stanza_resources` to system-specific cache directories via `platformdirs`. This may affect users who relied on the old default path or custom scripts expecting models in `~/stanza_resources`.
- gotcha Stanza's neural models require PyTorch. Users often encounter `ERROR: Could not find a version that satisfies the requirement torch` during `pip install stanza` if PyTorch is not pre-installed or if there are compatibility issues. Installing PyTorch separately first, especially via a system package manager (e.g., `conda install pytorch ...`), is frequently recommended for a smoother installation.
- gotcha Processing individual documents or sentences one by one in a loop can be significantly slower than processing them in batches. Stanza is optimized for batch processing.
- deprecated Prior to version 1.0.0, the library was named `stanfordnlp`. If you are looking for very old documentation or examples, you might encounter references to this legacy package name.
- gotcha Stanza has a strict Python version requirement of >=3.9. Using older Python versions can lead to various runtime errors, including `OSError: [Errno 22] Invalid argument` during model loading on macOS with Python <=3.7.1.
Install
-
pip install stanza
Imports
- stanza
import stanza
Quickstart
import stanza
# Download an English model (only needs to be run once)
# Stanza will auto-download if models are not found, but explicit download is good practice.
stanza.download('en')
# Initialize the English neural pipeline
nlp = stanza.Pipeline('en')
# Process some text
text = "Barack Obama was born in Hawaii. He was the 44th President of the United States."
doc = nlp(text)
# Access annotations
print(f"Processing: '{text}'")
for i, sent in enumerate(doc.sentences):
print(f"\nSentence {i+1}:")
for word in sent.words:
print(f" {word.text}\tUPOS: {word.upos}\tLemma: {word.lemma}\tDepRel: {word.deprel}\tHead: {doc.sentences[0].words[word.head-1].text if word.head > 0 else 'ROOT'}")
print("\nNamed Entities:")
for ent in doc.entities:
print(f" {ent.text}\tType: {ent.type}")