KeyBERT

0.9.0 · active · verified Mon Apr 13

KeyBERT is a minimal and easy-to-use Python library for keyword extraction that leverages state-of-the-art BERT embeddings to identify keywords and keyphrases most similar to a given document. Currently at version 0.9.0, it maintains an active release cadence with frequent updates improving performance, adding new features like LLM integration, and extending model backend support.

Warnings

Install

Imports

Quickstart

Initialize the KeyBERT model and use the `extract_keywords` method to retrieve relevant keywords from a document. The `top_n` parameter controls the number of keywords returned. Further parameters like `keyphrase_ngram_range`, `stop_words`, `use_mmr`, and `diversity` can be used to customize and diversify the extraction results.

from keybert import KeyBERT

doc = """
Supervised learning is the machine learning task of learning a function 
that maps an input to an output based on example input-output pairs. 
It infers a function from labeled training data consisting of a set of 
training examples. In supervised learning, each example is a pair 
consisting of an input object (typically a vector) and a desired 
output value (also called the supervisory signal).
"""

kw_model = KeyBERT()
keywords = kw_model.extract_keywords(doc, top_n=5)
print(keywords)

# Example with diversification (Maximal Marginal Relevance)
keywords_mmr = kw_model.extract_keywords(doc, keyphrase_ngram_range=(1, 3), 
                                        stop_words='english', 
                                        use_mmr=True, diversity=0.7, top_n=5)
print(keywords_mmr)

view raw JSON →