textacy

raw JSON →
0.13.0 verified Fri May 01 auth: no python

textacy is a Python library for NLP pre- and post-processing built on top of spaCy. Version 0.13.0, requires Python >=3.9. Released irregularly, with focuses on text extraction, tokenization, similarity, and topic modeling.

pip install textacy
error ModuleNotFoundError: No module named 'textacy.extract'
cause In textacy 0.13.0, the extract module moved to `textacy.extract.ngrams` etc.
fix
Use from textacy.extract import ngrams instead of from textacy import extract.
error TypeError: make_spacy_doc() missing 1 required positional argument: 'nlp'
cause Passed a string model name instead of a loaded spaCy Language object.
fix
Load model first: nlp = spacy.load('en_core_web_sm'), then make_spacy_doc(text, nlp).
error AttributeError: 'str' object has no attribute 'noun_chunks'
cause Attempting to call noun_chunks on raw string; TextDoc not used.
fix
Create a TextDoc from a spaCy Doc: doc = nlp(text); text_doc = TextDoc(doc).
breaking textacy 0.13.0 removed many top-level API functions (e.g., `textacy.extract.*`, `textacy.io.*`). Use `TextDoc` methods or dedicated submodules like `textacy.extract.ngrams`.
fix Replace old imports: e.g., `from textacy.extract import ngrams` instead of `textacy.extract.ngrams`.
deprecated `textacy.preprocess` module is deprecated in favor of `TextDoc` methods for text cleaning.
fix Use `text_doc.preprocess_text(...)` or `text_doc.replace(...)`.
gotcha `make_spacy_doc` requires a spaCy `Language` object, not a raw model name. Passing a string will raise TypeError.
fix Always pass a loaded nlp pipeline: `make_spacy_doc(text, nlp)`.
gotcha `TextDoc` constructor expects a `spacy.tokens.Doc`, not raw text. You must call `nlp(text)` or `make_spacy_doc(text, nlp)` first.
fix Create a Doc first: `doc = nlp(text)` then `TextDoc(doc)`.
pip install textacy[lang]

Basic usage: create a TextDoc from a spaCy Doc and extract noun chunks.

import spacy
import textacy

# Load spaCy model
nlp = spacy.load('en_core_web_sm')

# Create TextDoc from text
text = "The quick brown fox jumps over the lazy dog."
doc = textacy.make_spacy_doc(text, nlp)  # returns spacy.tokens.Doc
text_doc = textacy.TextDoc(doc)

# Extract noun chunks
chunks = list(text_doc.noun_chunks)
print(chunks[:2])