Stanza

1.11.1 · active · verified Sun Apr 12

Stanza, by the Stanford NLP Group, is a Python NLP library supporting over 70 human languages. It offers a fully neural pipeline for various text analysis tasks, including tokenization, multi-word token expansion, lemmatization, part-of-speech and morphological feature tagging, dependency parsing, and named entity recognition. Stanza also provides a stable Python interface to the Java Stanford CoreNLP Toolkit. Actively maintained, it receives regular updates, with the current version being 1.11.1.

Warnings

Install

Imports

Quickstart

This quickstart downloads the default English language model, initializes a Stanza pipeline, processes a sample text, and then prints out token-level annotations (UPOS, lemma, dependency relation) and named entities.

import stanza

# Download an English model (only needs to be run once)
# Stanza will auto-download if models are not found, but explicit download is good practice.
stanza.download('en')

# Initialize the English neural pipeline
nlp = stanza.Pipeline('en')

# Process some text
text = "Barack Obama was born in Hawaii. He was the 44th President of the United States."
doc = nlp(text)

# Access annotations
print(f"Processing: '{text}'")
for i, sent in enumerate(doc.sentences):
    print(f"\nSentence {i+1}:")
    for word in sent.words:
        print(f"  {word.text}\tUPOS: {word.upos}\tLemma: {word.lemma}\tDepRel: {word.deprel}\tHead: {doc.sentences[0].words[word.head-1].text if word.head > 0 else 'ROOT'}")

print("\nNamed Entities:")
for ent in doc.entities:
    print(f"  {ent.text}\tType: {ent.type}")

view raw JSON →