N-gram Fuzzy Search

4.0.3 · active · verified Thu Apr 16

The `ngram` library provides a `set` subclass for efficient fuzzy searching of members based on N-gram string similarity. It extends Python's built-in `set` class and offers static methods to compare string pairs. The N-grams are character-based, not word-based, focusing on string similarity rather than language modeling. The library is actively maintained, with the current version being 4.0.3, and updates are released as needed.

Common errors

Warnings

Install

Imports

Quickstart

Initialize an `NGram` instance with a collection of strings or objects (optionally with a `key` function for non-string items). Add items and use the `search` method to find members with high N-gram similarity to a query string. You can also use `NGram.compare` for direct string comparison.

from ngram import NGram

# Initialize an NGram object with a list of items
# N (default 3) is the size of n-grams to use for comparison
fuzzy_set = NGram(N=2, items=['apple', 'apricot', 'banana', 'orange', 'grape'])

# Add more items to the set
fuzzy_set.add('apply')

# Search for items similar to a query string
# The threshold (default 0.7) determines the minimum similarity score
results = fuzzy_set.search('appl', threshold=0.7)
print(f"Searching for 'appl': {results}")
# Expected: [('apple', 1.0), ('apply', 0.8), ('apricot', 0.75)] (scores may vary based on N)

# Directly compare two strings
similarity = NGram.compare('apple', 'apply', N=2)
print(f"Similarity between 'apple' and 'apply': {similarity}")

view raw JSON →