RAKE NLTK

1.0.6 · active · verified Thu Apr 16

RAKE-NLTK is a Python implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm, leveraging the Natural Language Toolkit (NLTK). It's designed to extract key phrases from text by analyzing word frequency and co-occurrence. The library, currently at version 1.0.6 (released September 2021), provides a straightforward interface for keyword extraction and offers configuration options for tokenizers, stopwords, and ranking metrics. Its release cadence is infrequent, with the last major update in 2021.

Common errors

Warnings

Install

Imports

Quickstart

Initialize the Rake object (which uses NLTK stopwords and punctuation by default) and extract keywords from text. This example also shows how to download the necessary NLTK corpora.

import nltk
nltk.download('stopwords')
nltk.download('punkt')

from rake_nltk import Rake

text = """Compatibility of systems of diophantine equations, strict inequations, and nonstrict inequations are considered. Upper bounds for components of a minimal set of solutions and algorithms of construction of minimal generating sets of solutions for all types of systems are given. These criteria and the corresponding algorithms for constructing a minimal supporting set of solutions can be used in solving all the considered types of systems and systems of mixed types."""

r = Rake()

r.extract_keywords_from_text(text)
ranked_phrases = r.get_ranked_phrases()
ranked_phrases_with_scores = r.get_ranked_phrases_with_scores()

print("Top 5 ranked phrases:")
for phrase in ranked_phrases[:5]:
    print(f"- {phrase}")

print("\nTop 5 ranked phrases with scores:")
for score, phrase in ranked_phrases_with_scores[:5]:
    print(f"- {phrase} (Score: {score:.2f})")

view raw JSON →