spaCy: Industrial-strength Natural Language Processing

3.8.13 · active · verified Sun Mar 29

spaCy is an open-source library for advanced Natural Language Processing (NLP) in Python and Cython. It's designed for industrial-strength production use, providing efficient processing of large volumes of text and featuring state-of-the-art neural network models for tasks like tagging, parsing, and named entity recognition. Currently at version 3.8.13, spaCy maintains an active development cycle with frequent releases addressing compatibility, performance, and new features.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to load a pre-trained English language model and use it to process text. It shows tokenization, lemmatization, part-of-speech tagging, dependency parsing, and named entity recognition. Remember that models must be downloaded separately after installing the spaCy library.

import spacy

# Load a pre-trained English pipeline
# Make sure to run `python -m spacy download en_core_web_sm` first
nlp = spacy.load("en_core_web_sm")

# Process a text
doc = nlp("Apple is looking at buying U.K. startup for $1 billion.")

# Iterate over tokens
for token in doc:
    print(f"{token.text:<15} {token.lemma_:<10} {token.pos_:<10} {token.dep_:<10} {token.ent_type_:<10}")

# Access named entities
print("\nNamed Entities:")
for ent in doc.ents:
    print(f"{ent.text} ({ent.label_})")

view raw JSON →