PyTextRank

raw JSON →
3.3.0 verified Fri May 01 auth: no python

Python implementation of TextRank as a spaCy pipeline extension for graph-based natural language work including phrase extraction, keyword extraction, and knowledge graph extraction. Current version 3.3.0, requires Python >=3.7, follows spaCy's extension pattern. Development is active but releases are infrequent.

pip install pytextrank
error ModuleNotFoundError: No module named 'pytextrank'
cause PyTextRank is not installed or installed in a different Python environment.
fix
Run 'pip install pytextrank' in the same environment as spaCy.
error KeyError: "Cannot add pipeline component 'pytextrank' - not found."
cause PyTextRank component is not registered; likely because import pytextrank is missing.
fix
Add 'import pytextrank' before calling nlp.add_pipe('pytextrank').
error AttributeError: 'spacy.tokens.doc.Doc' object has no attribute '_'
cause The spaCy model does not have token extensions set, meaning PyTextRank may not be added correctly or is not active.
fix
Verify that nlp.add_pipe('pytextrank') is called after loading the spaCy model and before processing text.
breaking PyTextRank 3.x requires spaCy 3.x. It will not work with spaCy 2.x. If you have spaCy 2.x, install an older PyTextRank version (<=2.x) or upgrade spaCy.
fix Upgrade spaCy: pip install -U spacy
deprecated Accessing keyphrases via doc._.phrases has replaced the older method doc._.textrank. The old method will be removed in a future version.
fix Use doc._.phrases instead of doc._.textrank.
gotcha The spaCy model must be loaded before adding PyTextRank pipeline. Adding the pipeline before loading the model will fail silently or raise an error.
fix Ensure nlp = spacy.load('en_core_web_sm') before nlp.add_pipe('pytextrank').

Loads spaCy model, adds PyTextRank pipeline, processes text, and prints extracted keyphrases with rank and count.

import spacy
import pytextrank

nlp = spacy.load('en_core_web_sm')
nlp.add_pipe('pytextrank')

text = "Natural language processing enables computers to understand human language. It is used in chatbots and translation."
doc = nlp(text)

for phrase in doc._.phrases:
    print(phrase.text, phrase.rank, phrase.count)