YAKE! Keyword Extraction

0.7.3 · active · verified Thu Apr 16

YAKE! (Yet Another Keyword Extractor) is a lightweight, unsupervised Python library for automatic keyword extraction. It identifies the most relevant keywords from a document using statistical text features, without requiring training data, external corpora, or dictionaries, and supports multiple languages. Currently at version 0.7.3, YAKE! maintains an active development pace with recent updates focusing on performance and adding lemmatization capabilities.

Common errors

Warnings

Install

Imports

Quickstart

Initializes `KeywordExtractor` with default or custom parameters and extracts top keywords from a given text. The output is a list of (keyword, score) tuples.

import yake

text = """Sources tell us that Google is acquiring Kaggle, a platform that hosts data science and machine learning competitions.
Details about the transaction remain somewhat vague, but given that Google is hosting its Cloud Next conference in San Francisco this week,
the official announcement could come as early as tomorrow. Reached by phone, Kaggle co-founder CEO Anthony Goldbloom declined
to deny that the acquisition is happening. Google itself declined 'to comment on rumors'. Kaggle, which has about half a million
data scientists on its platform, was founded by Goldbloom and Ben Hamner in 2010."""

# Default parameters
kw_extractor = yake.KeywordExtractor()
keywords = kw_extractor.extract_keywords(text)

print("Keywords (default settings):")
for kw, score in keywords:
    print(f"Keyphrase: {kw}, Score: {score}")

# Customizing parameters
# lan: language, n: max n-gram size, dedupLim: deduplication threshold,
# dedupFunc: deduplication function, windowsSize: window size, top: number of keywords
custom_kw_extractor = yake.KeywordExtractor(lan="en", n=3, dedupLim=0.9, dedupFunc='seqm', windowsSize=3, top=10, features=None)
keywords_custom = custom_kw_extractor.extract_keywords(text)

print("\nKeywords (custom settings):")
for kw, score in keywords_custom:
    print(f"Keyphrase: {kw}, Score: {score}")

view raw JSON →