{"id":2951,"library":"flashtext","title":"Flashtext Keyword Processor","description":"Flashtext is a Python library designed for efficient keyword extraction and replacement in sentences. It employs a custom algorithm based on Aho-Corasick and Trie data structures, providing significant performance gains over regular expressions, especially for large dictionaries of keywords. The current stable version is 2.7, released in 2018, and it is largely in a maintenance state, though still widely used.","status":"maintenance","version":"2.7","language":"en","source_language":"en","source_url":"https://github.com/vi3k6i5/flashtext","tags":["text-processing","keyword-extraction","keyword-replacement","nlp","performance"],"install":[{"cmd":"pip install flashtext","lang":"bash","label":"Install stable version"}],"dependencies":[],"imports":[{"symbol":"KeywordProcessor","correct":"from flashtext import KeywordProcessor"}],"quickstart":{"code":"from flashtext import KeywordProcessor\n\n# Initialize the keyword processor (case_sensitive=False by default)\nkeyword_processor = KeywordProcessor()\n\n# Add keywords. Can map multiple 'unclean' names to one 'clean' name.\nkeyword_processor.add_keyword('Big Apple', 'New York')\nkeyword_processor.add_keyword('Bay Area')\nkeyword_processor.add_keyword('New Delhi', 'NCR region')\n\n# Extract keywords\ntext_to_extract = 'I love Big Apple and Bay Area. New Delhi is also great.'\nkeywords_found = keyword_processor.extract_keywords(text_to_extract)\nprint(f\"Extracted keywords: {keywords_found}\") # Expected: ['New York', 'Bay Area', 'NCR region']\n\n# Replace keywords\ntext_to_replace = 'I love Big Apple and new delhi.'\nnew_sentence = keyword_processor.replace_keywords(text_to_replace)\nprint(f\"Replaced sentence: {new_sentence}\") # Expected: 'I love New York and NCR region.'\n\n# Extract with span information\nkeywords_with_span = keyword_processor.extract_keywords('I love Big Apple.', span_info=True)\nprint(f\"Keywords with span: {keywords_with_span}\") # Expected: [('New York', 7, 16)]","lang":"python","description":"This example demonstrates how to initialize the `KeywordProcessor`, add keywords with optional clean names, and then use it to extract or replace keywords in a given text. It also shows how to get span information for extracted keywords."},"warnings":[{"fix":"For non-Latin languages or custom boundary needs, initialize `KeywordProcessor` with a modified `non_word_boundaries` set. Example: `kp = KeywordProcessor(non_word_boundaries=set(['@', '#']))`.","message":"Flashtext's default word boundary definition (`[A-Za-z0-9_]`) might not be suitable for all languages (e.g., Chinese, Japanese) or custom requirements. It may fail to identify keywords correctly if they are not separated by these specific non-word characters. Users can customize `non_word_boundaries`.","severity":"gotcha","affected_versions":"2.0 - 2.7"},{"fix":"Benchmark performance for your specific use case. If keyword count is low or complex pattern matching is needed, consider standard regex. If keyword count is high, Flashtext is highly optimized for speed.","message":"Flashtext generally outperforms regex for keyword extraction/replacement when the number of keywords is large (typically >500). For a small number of keywords or when complex patterns (like partial matches or special character handling) are required, regular expressions might be equally or more efficient, or simply the only solution.","severity":"gotcha","affected_versions":"2.0 - 2.7"},{"fix":"Ensure that `add_keyword()` is only provided with string `clean_name` values if `replace_keywords()` functionality is intended. Tuple clean names are primarily for enhanced extraction information.","message":"If `add_keyword()` is used with a tuple as the `clean_name` (e.g., `add_keyword('Taj Mahal', ('Monument', 'Taj Mahal'))`), the `replace_keywords()` method will not function as expected because it anticipates a string replacement, not a tuple.","severity":"gotcha","affected_versions":"2.0 - 2.7"},{"fix":"Consider migrating to `flashtext2` for improved performance and broader language support, especially if hitting performance bottlenecks or unicode issues with the original `flashtext`. Be aware of potential API differences or slight behavior changes, though the core API is similar.","message":"A separate, community-driven package `flashtext2` (and `flashtextr`) exists, which is a rewrite in Rust, offering significant performance improvements (3-10x faster) and better Unicode handling. While not an official successor from the original author, it addresses some limitations of `flashtext`.","severity":"deprecated","affected_versions":"All versions of `flashtext` (2.x)"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}