{"id":21796,"library":"pysimstring","title":"PySimString","description":"PySimString is a Python implementation of the simstring fast string similarity search library. Version 1.3.0 supports Python 3.7-3.13 across multiple platforms. It provides efficient approximate string matching using various similarity measures (cosine, dice, jaccard, overlap, exact) with configurable feature sizes.","status":"active","version":"1.3.0","language":"python","source_language":"en","source_url":"https://github.com/percevalw/simstring","tags":["string-similarity","approximate-string-matching","simstring","n-gram"],"install":[{"cmd":"pip install pysimstring","lang":"bash","label":"pip install"}],"dependencies":[],"imports":[{"note":"The correct import is the module name itself; no subpackage.","wrong":null,"symbol":"simstring","correct":"import simstring"}],"quickstart":{"code":"import simstring\n\n# Build a database from a list of strings\ndb = simstring.reader()\ndb.add('hello')\ndb.add('hallo')\ndb.add('hullo')\ndb.add('world')\n\n# Use cosine similarity with threshold 0.7\nresults = db.retrieve('hallo', measure='cosine', threshold=0.7)\nprint(results)  # ['hello', 'hallo', 'hullo']","lang":"python","description":"Create a simstring database, add strings, and retrieve similar strings using cosine similarity."},"warnings":[{"fix":"Explicitly set measure='cosine', 'dice', 'jaccard', or 'overlap'.","message":"The default similarity measure is 'exact' (not cosine). Ensure you specify the desired measure via the `measure` parameter if you need fuzzy matching.","severity":"gotcha","affected_versions":"all"},{"fix":"Use `db = simstring.reader()`.","message":"The reader must be created via `simstring.reader()`, not `simstring.SimString()` or any other constructor. The API changed from the original simstring C++ library.","severity":"gotcha","affected_versions":">=1.0.0"},{"fix":"Batch insert or use the `simstring.writer()` (if available) to write to disk; currently only reader/writer-based API.","message":"The database is stored in memory. Adding a large number of strings can consume significant memory.","severity":"gotcha","affected_versions":"all"}],"env_vars":null,"last_verified":"2026-04-27T00:00:00.000Z","next_check":"2026-07-26T00:00:00.000Z","problems":[{"fix":"Replace `from simstring import SimString` with `import simstring; db = simstring.reader()`.","cause":"Incorrect class import; the library exposes a `reader()` function, not a class.","error":"AttributeError: module 'simstring' has no attribute 'SimString'"},{"fix":"Ensure pysimstring version >= 1.0.0; call `db.retrieve('query', measure='cosine', threshold=0.7)`.","cause":"Using outdated API where measure was positional or not accepted.","error":"TypeError: retrieve() got an unexpected keyword argument 'measure'"}],"ecosystem":"pypi","meta_description":null,"install_score":null,"install_tag":null,"quickstart_score":null,"quickstart_tag":null}