Flashtext Keyword Processor

2.7 · maintenance · verified Sat Apr 11

Flashtext is a Python library designed for efficient keyword extraction and replacement in sentences. It employs a custom algorithm based on Aho-Corasick and Trie data structures, providing significant performance gains over regular expressions, especially for large dictionaries of keywords. The current stable version is 2.7, released in 2018, and it is largely in a maintenance state, though still widely used.

Warnings

Install

Imports

Quickstart

This example demonstrates how to initialize the `KeywordProcessor`, add keywords with optional clean names, and then use it to extract or replace keywords in a given text. It also shows how to get span information for extracted keywords.

from flashtext import KeywordProcessor

# Initialize the keyword processor (case_sensitive=False by default)
keyword_processor = KeywordProcessor()

# Add keywords. Can map multiple 'unclean' names to one 'clean' name.
keyword_processor.add_keyword('Big Apple', 'New York')
keyword_processor.add_keyword('Bay Area')
keyword_processor.add_keyword('New Delhi', 'NCR region')

# Extract keywords
text_to_extract = 'I love Big Apple and Bay Area. New Delhi is also great.'
keywords_found = keyword_processor.extract_keywords(text_to_extract)
print(f"Extracted keywords: {keywords_found}") # Expected: ['New York', 'Bay Area', 'NCR region']

# Replace keywords
text_to_replace = 'I love Big Apple and new delhi.'
new_sentence = keyword_processor.replace_keywords(text_to_replace)
print(f"Replaced sentence: {new_sentence}") # Expected: 'I love New York and NCR region.'

# Extract with span information
keywords_with_span = keyword_processor.extract_keywords('I love Big Apple.', span_info=True)
print(f"Keywords with span: {keywords_with_span}") # Expected: [('New York', 7, 16)]

view raw JSON →