{"id":7657,"library":"rake-nltk","title":"RAKE NLTK","description":"RAKE-NLTK is a Python implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm, leveraging the Natural Language Toolkit (NLTK). It's designed to extract key phrases from text by analyzing word frequency and co-occurrence. The library, currently at version 1.0.6 (released September 2021), provides a straightforward interface for keyword extraction and offers configuration options for tokenizers, stopwords, and ranking metrics. Its release cadence is infrequent, with the last major update in 2021.","status":"active","version":"1.0.6","language":"en","source_language":"en","source_url":"https://github.com/csurfer/rake-nltk","tags":["nlp","text-mining","keyword-extraction","nltk","rake"],"install":[{"cmd":"pip install rake-nltk","lang":"bash","label":"Install using pip"}],"dependencies":[{"reason":"Required for natural language processing functionalities like tokenization and stopwords. Specific NLTK corpora ('stopwords', 'punkt') must be downloaded separately.","package":"nltk","optional":false}],"imports":[{"symbol":"Rake","correct":"from rake_nltk import Rake"}],"quickstart":{"code":"import nltk\nnltk.download('stopwords')\nnltk.download('punkt')\n\nfrom rake_nltk import Rake\n\ntext = \"\"\"Compatibility of systems of diophantine equations, strict inequations, and nonstrict inequations are considered. Upper bounds for components of a minimal set of solutions and algorithms of construction of minimal generating sets of solutions for all types of systems are given. These criteria and the corresponding algorithms for constructing a minimal supporting set of solutions can be used in solving all the considered types of systems and systems of mixed types.\"\"\"\n\nr = Rake()\n\nr.extract_keywords_from_text(text)\nranked_phrases = r.get_ranked_phrases()\nranked_phrases_with_scores = r.get_ranked_phrases_with_scores()\n\nprint(\"Top 5 ranked phrases:\")\nfor phrase in ranked_phrases[:5]:\n    print(f\"- {phrase}\")\n\nprint(\"\\nTop 5 ranked phrases with scores:\")\nfor score, phrase in ranked_phrases_with_scores[:5]:\n    print(f\"- {phrase} (Score: {score:.2f})\")\n","lang":"python","description":"Initialize the Rake object (which uses NLTK stopwords and punctuation by default) and extract keywords from text. This example also shows how to download the necessary NLTK corpora."},"warnings":[{"fix":"Run `import nltk; nltk.download('stopwords'); nltk.download('punkt')` once after installing `nltk` and `rake-nltk`.","message":"NLTK data (stopwords and punkt tokenizer) are critical dependencies for `rake-nltk` and must be downloaded separately. Without these, the library will fail with a `LookupError`.","severity":"breaking","affected_versions":"All versions of rake-nltk"},{"fix":"Always prefer `pip install rake-nltk`. If installing from source, ensure `nltk` is installed beforehand (`pip install nltk`) and consider updating `pip` to a recent version.","message":"Installing `rake-nltk` directly from a cloned GitHub repository using `python setup.py install` can sometimes lead to an `error: package directory 'rake_nltk' does not exist`, especially in older `pip` versions or specific build environments. This is often related to how NLTK dependencies or post-install hooks are handled during the build process.","severity":"gotcha","affected_versions":"Potentially all versions, more common with older pip/direct source installs."}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Run `import nltk; nltk.download('stopwords'); nltk.download('punkt')` in your Python environment.","cause":"The required NLTK corpus (e.g., 'stopwords' or 'punkt') has not been downloaded.","error":"LookupError: <resource> not found. Please use the NLTK Downloader to obtain the resource:"},{"fix":"Use `pip install rake-nltk` instead of installing from source. Ensure `nltk` is installed if you encounter persistent issues.","cause":"Attempting to install `rake-nltk` by cloning the repository and running `python setup.py install`, which can fail due to specific build environment issues or older `pip` versions not handling dependencies correctly during setup.","error":"error: package directory 'rake_nltk' does not exist"},{"fix":"Verify that `nltk.download('stopwords')` and `nltk.download('punkt')` have been executed. Review your input text for sufficient content and relevant non-stop words.","cause":"This can happen if NLTK stopwords or punkt tokenizer are not downloaded, or if the input text is too short, lacks significant keywords, or is primarily composed of stop words.","error":"r.get_ranked_phrases() returns an empty list or unexpected results"}]}