textacy
raw JSON → 0.13.0 verified Fri May 01 auth: no python
textacy is a Python library for NLP pre- and post-processing built on top of spaCy. Version 0.13.0, requires Python >=3.9. Released irregularly, with focuses on text extraction, tokenization, similarity, and topic modeling.
pip install textacy Common errors
error ModuleNotFoundError: No module named 'textacy.extract' ↓
cause In textacy 0.13.0, the extract module moved to `textacy.extract.ngrams` etc.
fix
Use
from textacy.extract import ngrams instead of from textacy import extract. error TypeError: make_spacy_doc() missing 1 required positional argument: 'nlp' ↓
cause Passed a string model name instead of a loaded spaCy Language object.
fix
Load model first:
nlp = spacy.load('en_core_web_sm'), then make_spacy_doc(text, nlp). error AttributeError: 'str' object has no attribute 'noun_chunks' ↓
cause Attempting to call noun_chunks on raw string; TextDoc not used.
fix
Create a TextDoc from a spaCy Doc:
doc = nlp(text); text_doc = TextDoc(doc). Warnings
breaking textacy 0.13.0 removed many top-level API functions (e.g., `textacy.extract.*`, `textacy.io.*`). Use `TextDoc` methods or dedicated submodules like `textacy.extract.ngrams`. ↓
fix Replace old imports: e.g., `from textacy.extract import ngrams` instead of `textacy.extract.ngrams`.
deprecated `textacy.preprocess` module is deprecated in favor of `TextDoc` methods for text cleaning. ↓
fix Use `text_doc.preprocess_text(...)` or `text_doc.replace(...)`.
gotcha `make_spacy_doc` requires a spaCy `Language` object, not a raw model name. Passing a string will raise TypeError. ↓
fix Always pass a loaded nlp pipeline: `make_spacy_doc(text, nlp)`.
gotcha `TextDoc` constructor expects a `spacy.tokens.Doc`, not raw text. You must call `nlp(text)` or `make_spacy_doc(text, nlp)` first. ↓
fix Create a Doc first: `doc = nlp(text)` then `TextDoc(doc)`.
Install
pip install textacy[lang] Imports
- TextDoc wrong
from textacy.doc import Doccorrectfrom textacy import TextDoc - Corpus
from textacy import Corpus
Quickstart
import spacy
import textacy
# Load spaCy model
nlp = spacy.load('en_core_web_sm')
# Create TextDoc from text
text = "The quick brown fox jumps over the lazy dog."
doc = textacy.make_spacy_doc(text, nlp) # returns spacy.tokens.Doc
text_doc = textacy.TextDoc(doc)
# Extract noun chunks
chunks = list(text_doc.noun_chunks)
print(chunks[:2])