{"id":7874,"library":"yake","title":"YAKE! Keyword Extraction","description":"YAKE! (Yet Another Keyword Extractor) is a lightweight, unsupervised Python library for automatic keyword extraction. It identifies the most relevant keywords from a document using statistical text features, without requiring training data, external corpora, or dictionaries, and supports multiple languages. Currently at version 0.7.3, YAKE! maintains an active development pace with recent updates focusing on performance and adding lemmatization capabilities.","status":"active","version":"0.7.3","language":"en","source_language":"en","source_url":"https://github.com/INESCTEC/yake","tags":["nlp","keyword-extraction","text-mining","unsupervised"],"install":[{"cmd":"pip install yake","lang":"bash","label":"Install latest stable version"}],"dependencies":[{"reason":"CLI table formatting","package":"tabulate"},{"reason":"Sentence and token segmentation","package":"segtok"},{"reason":"Graph manipulation","package":"networkx"},{"reason":"Numerical operations","package":"numpy>=1.24.0"},{"reason":"Command-line interface tools","package":"click>=6.0"},{"reason":"String comparison for deduplication","package":"jellyfish"},{"reason":"For lemmatization (optional dependency 'lemmatization')","package":"spacy>=3.8.0","optional":true},{"reason":"For lemmatization and other text processing (optional dependency 'lemmatization')","package":"nltk>=3.8.0","optional":true}],"imports":[{"note":"The `KeywordExtractor` class is directly available from the top-level `yake` package after `import yake`.","wrong":"import yake.KeywordExtractor","symbol":"KeywordExtractor","correct":"from yake import KeywordExtractor"}],"quickstart":{"code":"import yake\n\ntext = \"\"\"Sources tell us that Google is acquiring Kaggle, a platform that hosts data science and machine learning competitions.\nDetails about the transaction remain somewhat vague, but given that Google is hosting its Cloud Next conference in San Francisco this week,\nthe official announcement could come as early as tomorrow. Reached by phone, Kaggle co-founder CEO Anthony Goldbloom declined\nto deny that the acquisition is happening. Google itself declined 'to comment on rumors'. Kaggle, which has about half a million\ndata scientists on its platform, was founded by Goldbloom and Ben Hamner in 2010.\"\"\"\n\n# Default parameters\nkw_extractor = yake.KeywordExtractor()\nkeywords = kw_extractor.extract_keywords(text)\n\nprint(\"Keywords (default settings):\")\nfor kw, score in keywords:\n    print(f\"Keyphrase: {kw}, Score: {score}\")\n\n# Customizing parameters\n# lan: language, n: max n-gram size, dedupLim: deduplication threshold,\n# dedupFunc: deduplication function, windowsSize: window size, top: number of keywords\ncustom_kw_extractor = yake.KeywordExtractor(lan=\"en\", n=3, dedupLim=0.9, dedupFunc='seqm', windowsSize=3, top=10, features=None)\nkeywords_custom = custom_kw_extractor.extract_keywords(text)\n\nprint(\"\\nKeywords (custom settings):\")\nfor kw, score in keywords_custom:\n    print(f\"Keyphrase: {kw}, Score: {score}\")","lang":"python","description":"Initializes `KeywordExtractor` with default or custom parameters and extracts top keywords from a given text. The output is a list of (keyword, score) tuples."},"warnings":[{"fix":"Refer to the GitHub releases and documentation for migration details and updated `KeywordExtractor` usage patterns.","message":"Version 0.6.0 introduced a 'Refactored version of YAKE!'. Users upgrading from versions prior to 0.6.0 may encounter breaking API changes, particularly in how `KeywordExtractor` is initialized or its methods are called.","severity":"breaking","affected_versions":"<0.6.0 to >=0.6.0"},{"fix":"Ensure necessary NLTK data is present by running `import nltk; nltk.download('punkt')` (or other required data) in a Python interpreter or script before using lemmatization.","message":"If using YAKE!'s lemmatization features (enabled via `spacy` or `nltk` optional dependencies), NLTK data (e.g., 'punkt') might be required, leading to `LookupError` if not downloaded.","severity":"gotcha","affected_versions":"All versions with lemmatization"},{"fix":"Ensure you install the current, actively maintained version by using `pip install yake`. The current official repository is `https://github.com/INESCTEC/yake`.","message":"An older `yake` package (e.g., v0.3.x) is present on PyPI and is officially deprecated and unmaintained. Installing this older version will lead to outdated functionality and no support.","severity":"deprecated","affected_versions":"0.3.x"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"pip install yake","cause":"The 'yake' package is not installed or the Python interpreter cannot find it in the current environment.","error":"ModuleNotFoundError: No module named 'yake'"},{"fix":"Run `import nltk; nltk.download('punkt')` in a Python interpreter or script to download the necessary NLTK data.","cause":"The `punkt` tokenizer data, a common NLTK resource, has not been downloaded, and YAKE! (or its underlying dependencies for lemmatization) is attempting to use it.","error":"LookupError: Resource punkt not found. Please use the NLTK Downloader to obtain the resource: >>> import nltk >>> nltk.download('punkt')"},{"fix":"Iterate through the list of tuples and unpack them: `for kw, score in keywords: print(f\"Keyword: {kw}, Score: {score}\")`","cause":"The `extract_keywords` method returns a list of tuples `(keyword, score)`. This error occurs if you try to access elements within these tuples using incorrect indexing (e.g., `keyword[0]` for the first element, `keyword['name']` like a dictionary, or treating the list itself as a dictionary).","error":"TypeError: 'tuple' object is not subscriptable"}]}