{"id":8351,"library":"ngram","title":"N-gram Fuzzy Search","description":"The `ngram` library provides a `set` subclass for efficient fuzzy searching of members based on N-gram string similarity. It extends Python's built-in `set` class and offers static methods to compare string pairs. The N-grams are character-based, not word-based, focusing on string similarity rather than language modeling. The library is actively maintained, with the current version being 4.0.3, and updates are released as needed.","status":"active","version":"4.0.3","language":"en","source_language":"en","source_url":"https://github.com/gpoulter/python-ngram","tags":["nlp","fuzzy-matching","search","n-gram","similarity"],"install":[{"cmd":"pip install ngram","lang":"bash","label":"Install `ngram`"}],"dependencies":[],"imports":[{"note":"The primary class for fuzzy matching is `NGram`, which should be imported directly from the `ngram` package.","wrong":"import ngram; ngram.NGram()","symbol":"NGram","correct":"from ngram import NGram"}],"quickstart":{"code":"from ngram import NGram\n\n# Initialize an NGram object with a list of items\n# N (default 3) is the size of n-grams to use for comparison\nfuzzy_set = NGram(N=2, items=['apple', 'apricot', 'banana', 'orange', 'grape'])\n\n# Add more items to the set\nfuzzy_set.add('apply')\n\n# Search for items similar to a query string\n# The threshold (default 0.7) determines the minimum similarity score\nresults = fuzzy_set.search('appl', threshold=0.7)\nprint(f\"Searching for 'appl': {results}\")\n# Expected: [('apple', 1.0), ('apply', 0.8), ('apricot', 0.75)] (scores may vary based on N)\n\n# Directly compare two strings\nsimilarity = NGram.compare('apple', 'apply', N=2)\nprint(f\"Similarity between 'apple' and 'apply': {similarity}\")","lang":"python","description":"Initialize an `NGram` instance with a collection of strings or objects (optionally with a `key` function for non-string items). Add items and use the `search` method to find members with high N-gram similarity to a query string. You can also use `NGram.compare` for direct string comparison."},"warnings":[{"fix":"Be aware of the character-based nature. For word N-grams, tokenize your text into words first and then apply appropriate N-gram logic, possibly with a different NLP-focused library.","message":"The `ngram` library is designed for character-based N-grams by default, not word-based. This means it splits strings into sequences of characters, not words. If you require word N-grams for natural language processing tasks, you will need to pre-process your text or use a different library (e.g., NLTK).","severity":"gotcha","affected_versions":"All versions"},{"fix":"If pickling `NGram` instances is required, use a named function for the `key` parameter instead of a lambda function. For example, `def get_name(obj): return obj.name; NGram(items, key=get_name)`.","message":"When initializing `NGram` with a `key` function to convert items to strings (e.g., `NGram(items, key=str)` or `NGram(items, key=lambda x: x.name)`), using an anonymous (lambda) function will prevent the resulting `NGram` object from being pickled (serialized).","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always pass Unicode strings to `NGram` in Python 3. If dealing with raw byte data, decode it to Unicode first (e.g., `my_bytes.decode('utf-8')`) before indexing or searching.","message":"In Python 2, `NGram` could behave unexpectedly with non-ASCII byte-strings due to splitting on byte boundaries. While Python 3 primarily uses Unicode strings, ensuring all inputs to `NGram` are proper Unicode strings is crucial for correct multi-byte character handling.","severity":"gotcha","affected_versions":"Python 2.x and early Python 3.x where byte-string confusion was common. Less of an issue in modern Python 3, but still a consideration."}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Run `pip install ngram` to install the library.","cause":"The `ngram` package has not been installed in your current Python environment.","error":"ModuleNotFoundError: No module named 'ngram'"},{"fix":"Rename your Python script to something other than `ngram.py` (e.g., `my_ngram_app.py`) and try running it again.","cause":"This typically occurs when your Python script is named `ngram.py`. When you try to import `NGram` from `ngram`, Python tries to import from your own script rather than the installed library, and your script does not contain the `NGram` class.","error":"AttributeError: module 'ngram' has no attribute 'NGram'"}]}