{"id":5281,"library":"keybert","title":"KeyBERT","description":"KeyBERT is a minimal and easy-to-use Python library for keyword extraction that leverages state-of-the-art BERT embeddings to identify keywords and keyphrases most similar to a given document. Currently at version 0.9.0, it maintains an active release cadence with frequent updates improving performance, adding new features like LLM integration, and extending model backend support.","status":"active","version":"0.9.0","language":"en","source_language":"en","source_url":"https://github.com/MaartenGr/KeyBERT.git","tags":["NLP","keyword extraction","BERT","transformers","machine learning","natural language processing"],"install":[{"cmd":"pip install keybert","lang":"bash","label":"Default Installation"},{"cmd":"pip install keybert[flair] keybert[gensim] keybert[spacy] keybert[use] keybert[hf]","lang":"bash","label":"Install with specific embedding model backends"},{"cmd":"pip install keybert --no-deps scikit-learn model2vec","lang":"bash","label":"Light-weight installation (without sentence-transformers)"}],"dependencies":[{"reason":"Default and recommended backend for embedding models.","package":"sentence-transformers","optional":false},{"reason":"Required by sentence-transformers for BERT models.","package":"torch","optional":false},{"reason":"Used for CountVectorizer for candidate keyword generation.","package":"scikit-learn","optional":false},{"reason":"Required for using KeyLLM with OpenAI models.","package":"openai","optional":true},{"reason":"Optional embedding model backend.","package":"flair","optional":true},{"reason":"Optional embedding model backend.","package":"gensim","optional":true},{"reason":"Optional embedding model backend.","package":"spacy","optional":true},{"reason":"Required for 'use' (Universal Sentence Encoder) optional backend.","package":"tensorflow_text","optional":true},{"reason":"Used for lightweight installation, provides alternative embeddings.","package":"model2vec","optional":true}],"imports":[{"symbol":"KeyBERT","correct":"from keybert import KeyBERT"},{"note":"KeyLLM is typically imported directly from keybert for convenience, though its components like OpenAI backend are in keybert.llm.","wrong":"from keybert.llm import KeyLLM","symbol":"KeyLLM","correct":"from keybert import KeyLLM"},{"note":"OpenAI LLM integration class is specifically located in the keybert.llm submodule.","wrong":"from keybert import OpenAI","symbol":"OpenAI","correct":"from keybert.llm import OpenAI"}],"quickstart":{"code":"from keybert import KeyBERT\n\ndoc = \"\"\"\nSupervised learning is the machine learning task of learning a function \nthat maps an input to an output based on example input-output pairs. \nIt infers a function from labeled training data consisting of a set of \ntraining examples. In supervised learning, each example is a pair \nconsisting of an input object (typically a vector) and a desired \noutput value (also called the supervisory signal).\n\"\"\"\n\nkw_model = KeyBERT()\nkeywords = kw_model.extract_keywords(doc, top_n=5)\nprint(keywords)\n\n# Example with diversification (Maximal Marginal Relevance)\nkeywords_mmr = kw_model.extract_keywords(doc, keyphrase_ngram_range=(1, 3), \n                                        stop_words='english', \n                                        use_mmr=True, diversity=0.7, top_n=5)\nprint(keywords_mmr)","lang":"python","description":"Initialize the KeyBERT model and use the `extract_keywords` method to retrieve relevant keywords from a document. The `top_n` parameter controls the number of keywords returned. Further parameters like `keyphrase_ngram_range`, `stop_words`, `use_mmr`, and `diversity` can be used to customize and diversify the extraction results."},"warnings":[{"fix":"Upgrade your Python environment to version 3.8 or higher.","message":"Support for Python versions 3.6 and 3.7 was dropped in KeyBERT version 0.8.5. Users on older Python versions must upgrade to Python 3.8 or newer.","severity":"breaking","affected_versions":">=0.8.5"},{"fix":"Ensure your `openai` package is version 1.0 or higher when using `KeyLLM` with OpenAI models. (e.g., `pip install openai>=1`).","message":"KeyBERT's `KeyLLM` integration with the OpenAI API required updates for `openai>=1`. Older `openai` library versions (e.g., pre-1.0) are incompatible.","severity":"breaking","affected_versions":">=0.8.3"},{"fix":"Run KeyBERT on a system with a GPU. When processing many documents, pass a list of documents to `kw_model.extract_keywords()` instead of iterating over them individually.","message":"For large datasets or improved performance, using a GPU is highly recommended. Processing multiple documents in a single `extract_keywords` call significantly speeds up inference by embedding words only once.","severity":"gotcha","affected_versions":"All"},{"fix":"Pass `use_mmr=True` (and optionally `diversity` parameter) or `use_maxsum=True` (and `nr_candidates`) to the `extract_keywords` method to enable diversification.","message":"By default, KeyBERT uses cosine similarity which may result in very similar keywords. To get more diverse keywords, leverage diversification techniques like 'Max Sum Distance' or 'Maximal Marginal Relevance' (MMR).","severity":"gotcha","affected_versions":"All"},{"fix":"While not always necessary, clean documents of irrelevant noise like HTML tags or other structural elements that do not contribute to the semantic meaning for better keyword extraction.","message":"KeyBERT generally doesn't require extensive text preprocessing due to BERT's contextual understanding. However, noisy data (e.g., HTML tags) can negatively impact results.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-13T00:00:00.000Z","next_check":"2026-07-12T00:00:00.000Z"}