{"id":7296,"library":"hyperscan","title":"Python Hyperscan","description":"Python bindings for Hyperscan. This library provides high-performance regular expression matching, designed for large-scale pattern matching tasks, including multi-pattern and streaming modes. Version 0.8.2 is the current release, with active development and frequent updates, often seeing several patch releases and minor updates within a few months.","status":"active","version":"0.8.2","language":"en","source_language":"en","source_url":"https://github.com/darvid/python-hyperscan","tags":["regex","pattern matching","high performance","security","multi-pattern","streaming"],"install":[{"cmd":"pip install hyperscan","lang":"bash","label":"Install from PyPI"}],"dependencies":[],"imports":[{"note":"The primary import for the Hyperscan library.","symbol":"hyperscan","correct":"import hyperscan"},{"note":"Commonly imported classes and flags for pattern compilation.","symbol":"Database","correct":"from hyperscan import Database, HS_FLAG_CASELESS, HS_FLAG_SOM_LEFTMOST"}],"quickstart":{"code":"import hyperscan\n\ndef on_match(id: int, from_: int, to: int, flags: int, context: object | None) -> int:\n    print(f\"Match for pattern ID {id} at [{from_}:{to}] with flags {flags}\")\n    return 0 # Continue scanning\n\n# Define patterns with IDs and flags\npatterns_config = [\n    (b'foobar', 101, 0), # Simple literal match\n    (b'baz', 102, hyperscan.HS_FLAG_CASELESS), # Case-insensitive\n    (b'qux', 103, hyperscan.HS_FLAG_SOM_LEFTMOST | hyperscan.HS_FLAG_SINGLEMATCH) # Report start of match, single match\n]\n\nexpressions, ids, flags = zip(*patterns_config)\n\ndb = hyperscan.Database()\ndb.compile(\n    expressions=expressions,\n    ids=ids,\n    elements=len(patterns_config),\n    flags=flags\n)\n\n# Create a scratch space for scanning\nscratch = db.alloc_scratch()\n\n# Scan a data buffer in block mode\ndata = b'This is a FoObAr string with baz and QuX inside.'\nprint(f\"Scanning data: '{data.decode()}'\")\nmatches_found = db.scan(data=data, scratch=scratch, match_event_handler=on_match)\n\nif not matches_found:\n    print(\"No matches found.\")\n\n# Example of streaming mode\nprint(\"\\n--- Streaming Mode ---\")\ndb_streaming = hyperscan.Database(mode=hyperscan.HS_MODE_STREAM)\ndb_streaming.compile(\n    expressions=[b'stream_test'],\n    ids=[201],\n    elements=1,\n    flags=[0]\n)\nscratch_streaming = db_streaming.alloc_scratch()\n\nwith db_streaming.stream(scratch=scratch_streaming, match_event_handler=on_match) as stream:\n    stream.scan(data=b'first part of stream_test data')\n    stream.scan(data=b'cond part of stream_test data')\n    # Matches might only be reported on close or when enough data accumulated.\nprint(\"Streaming scan initiated. Matches may be reported during stream.scan or stream.close.\")\n","lang":"python","description":"This quickstart demonstrates how to compile multiple regular expressions into a Hyperscan database and then scan input data in both block and streaming modes. It includes a match event handler to process detected matches. Remember that patterns must be bytes."},"warnings":[{"fix":"For persistent issues, consider building from source with specific CMake flags to enable PCRE's UTF-8 support, or pre-encode/decode text to bytes. Note that upstream Hyperscan/Vectorscan has known bugs with `HS_FLAG_UTF8` for certain patterns.","message":"Starting in v0.7.9, the build system migration to CMake changed how PCRE is linked, which can cause 'Expression is not valid UTF-8' errors for valid Unicode patterns. This broke existing code that worked in v0.7.8 and earlier, as PCRE was built from source without UTF-8 support enabled.","severity":"breaking","affected_versions":">=0.7.9, <0.8.2"},{"fix":"Design your regex patterns to identify occurrences, and then use a separate, capture-group-capable regex engine on the identified regions if capture groups are essential.","message":"Hyperscan does not support capturing sub-expressions (capture groups). If you need to extract specific parts of a matched string, you will need a two-stage approach: use Hyperscan for high-performance identification, and then a standard regex engine (like Python's `re` module) for detailed extraction on the matched segments.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always provide a `match_event_handler` function if you intend to process matches. The handler function's return value can also control scanning termination.","message":"The `scan` methods (block and stream) do not strictly require a `match_event_handler` callback. If no handler is provided, match production is entirely suppressed, meaning you won't get any results. This can be misleading if you expect a return value representing matches.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Only use `HS_FLAG_SOM_LEFTMOST` when absolutely necessary, and be aware it may limit pattern complexity.","message":"Using `HS_FLAG_SOM_LEFTMOST` to obtain the leftmost start offset of a match (Start Of Match) can significantly impact performance and reduce the range of patterns that Hyperscan can compile, potentially leading to 'Pattern too large' errors.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Upgrade to `hyperscan` version `0.8.2` or newer to ensure correct match offset handling for large data buffers.","message":"Versions prior to 0.8.2 had a bug that could cause incorrect match offset truncation when scanning data buffers larger than 4GB, leading to potentially inaccurate match positions.","severity":"gotcha","affected_versions":"<0.8.2"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Ensure `pip install hyperscan` completes without errors. If building from source, verify all C/C++ build prerequisites (CMake, C/C++ toolchain, Ragel) are met. For older library versions, ensure the system's Hyperscan library matches the expected version (e.g., `v0.1.5` needed Hyperscan `v4.x`, while `v0.2+` needs `v5.x`). Confirm your Python virtual environment is active.","cause":"The underlying C extension for hyperscan failed to build or was not correctly installed/linked, or there's a version mismatch between the Python bindings and the underlying Hyperscan/Vectorscan C library if building from source.","error":"No module named 'hyperscan._hyperscan'"},{"fix":"If using pre-built wheels, try installing a more generic wheel if available, or ensure your CPU supports the instruction sets used. If building from source, set `CMAKE_ARGS=\"-DUSE_CPU_NATIVE=OFF\"` during `pip install .` to disable CPU-native optimizations, or compile on the target machine.","cause":"Hyperscan binaries (especially wheels) can be compiled with CPU-specific optimizations (like AVX instructions). If these instructions are not supported by the CPU where the library is being imported, it can lead to an 'Illegal instruction' crash.","error":"Illegal instruction (core dumped) or segmentation fault on `import hyperscan`"},{"fix":"Allocate a separate `hyperscan.Scratch` object for each concurrent scan operation or thread. Ensure each `db.scan()` or `stream.scan()` call receives its own dedicated `scratch` instance.","cause":"Hyperscan's scratch space (`hyperscan.Scratch` object) is not thread-safe and cannot be used concurrently by multiple scanning operations without proper management.","error":"hyperscan.error: ScratchInUseError('error code -10')"}]}