{"id":8000,"library":"cdifflib","title":"cdifflib","description":"cdifflib is a Python library that provides a C implementation of parts of Python's standard `difflib` module, specifically focusing on `SequenceMatcher`. It creates a `CSequenceMatcher` type which inherits most functions from `difflib.SequenceMatcher`, offering up to 4x speed improvement when diffing large streams. The current version is 1.2.9, with irregular but ongoing maintenance releases to support newer Python versions and address issues.","status":"active","version":"1.2.9","language":"en","source_language":"en","source_url":"https://github.com/mduggan/cdifflib","tags":["difflib","performance","c-extension","sequence-matching","diff"],"install":[{"cmd":"pip install cdifflib","lang":"bash","label":"Install with pip"}],"dependencies":[],"imports":[{"note":"The primary class CSequenceMatcher is directly imported from the cdifflib package, not as an attribute of the top-level package itself.","wrong":"import cdifflib","symbol":"CSequenceMatcher","correct":"from cdifflib import CSequenceMatcher"},{"note":"To transparently replace the standard library's SequenceMatcher with the faster C version for other libraries.","symbol":"SequenceMatcher_monkey_patch","correct":"import difflib\nfrom cdifflib import CSequenceMatcher\ndifflib.SequenceMatcher = CSequenceMatcher"}],"quickstart":{"code":"from cdifflib import CSequenceMatcher\n\n# Example 1: Basic sequence matching\ns = CSequenceMatcher(None, ' abcd', 'abcd abcd')\nmatch = s.find_longest_match(0, 5, 0, 9)\nprint(f\"Longest match: {match}\")\n\n# Example 2: With custom junk filter\ns2 = CSequenceMatcher(lambda x: x == \" \",\n                      \"private Thread currentThread;\",\n                      \"private volatile Thread currentThread;\")\nratio = round(s2.ratio(), 3)\nprint(f\"Similarity ratio: {ratio}\")","lang":"python","description":"This quickstart demonstrates how to instantiate `CSequenceMatcher` and use its `find_longest_match` and `ratio` methods, similar to `difflib.SequenceMatcher`."},"warnings":[{"fix":"For optimal performance with large datasets, ensure that `a` and `b` are already `list` instances before passing them to the `CSequenceMatcher` constructor.","message":"The C implementation of `CSequenceMatcher` internally converts input sequences (`a` and `b`) to `list` type if they are not already lists. While convenient, this implicit conversion can incur performance or memory overhead for very large iterables if not anticipated.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Verify that all items in your input sequences are hashable. Convert unhashable elements (like lists or dictionaries) to hashable equivalents (e.g., tuples or strings) if necessary before creating the `CSequenceMatcher`.","message":"Elements within the input sequences (`a` and `b`) must be hashable. The underlying C implementation performs hashing on sequence items. Passing sequences containing unhashable types (e.g., mutable lists or dictionaries) will result in a `TypeError: unhashable type: ...`.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Upgrade to `cdifflib` version 1.2.0 or newer. If installing from source, ensure your environment has a compatible C compiler installed.","message":"Prior to version 1.2.0, `cdifflib` had installation issues on Python 3, particularly when installing from source or without pre-compiled wheels. This often led to `AttributeError: module 'cdifflib' has no attribute 'CSequenceMatcher'` as the C extension failed to build correctly.","severity":"breaking","affected_versions":"<1.2.0"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Ensure you are using `cdifflib>=1.2.0`. If the problem persists, ensure your system has a C compiler (e.g., `build-essential` on Linux, Xcode on macOS, C++ Build Tools on Windows) if a pre-compiled wheel is not available for your platform. Then, try `pip install --upgrade cdifflib`.","cause":"This typically occurs in older `cdifflib` versions (pre-1.2.0) on Python 3 environments where the C extension was not properly built and linked during installation, preventing `CSequenceMatcher` from being exposed.","error":"AttributeError: module 'cdifflib' has no attribute 'CSequenceMatcher'"},{"fix":"Modify your input sequences so that all their elements are hashable. For example, convert lists to tuples: `sm = CSequenceMatcher(None, [('a',), ('b',)], [('a',), ('c',)])`.","cause":"The C implementation of `CSequenceMatcher` requires individual elements within the `a` and `b` sequences to be hashable. When a sequence contains mutable types like lists or dictionaries, this error is raised.","error":"TypeError: unhashable type: 'list'"},{"fix":"Install the \"Build Tools for Visual Studio\" from the provided Microsoft link, ensuring you select the 'Desktop development with C++' workload. Alternatively, try to use a Python version for which `cdifflib` provides pre-compiled wheels.","cause":"On Windows, if a pre-compiled wheel for your specific Python version and architecture is not available, `pip` attempts to compile `cdifflib` from source, which requires a compatible C/C++ compiler.","error":"error: Microsoft Visual C++ 14.0 or greater is required. Get it with \"Microsoft C++ Build Tools\": https://visualstudio.microsoft.com/visual-cpp-build-tools/"}]}