{"id":2792,"library":"stringzilla","title":"StringZilla","description":"StringZilla is a Python library that significantly accelerates string operations like search, hashing, sorting, and processing, leveraging SIMD (Single Instruction, Multiple Data) and SWAR (SIMD Within A Register) for performance. It's designed to handle large textual datasets (100+ GB) efficiently, offering a `Str` class similar to Python's built-in `str` and a `File` class for memory-mapping files. Currently at version 4.6.0, it sees frequent updates with multiple patch and minor releases per month.","status":"active","version":"4.6.0","language":"en","source_language":"en","source_url":"https://github.com/ashvardanian/StringZilla","tags":["strings","SIMD","performance","search","hashing","unicode","optimization","text-processing","memory-mapping"],"install":[{"cmd":"pip install stringzilla","lang":"bash","label":"For serial algorithms"},{"cmd":"pip install stringzillas-cpus","lang":"bash","label":"For parallel multi-CPU backends"},{"cmd":"pip install stringzillas-cuda","lang":"bash","label":"For parallel Nvidia GPU backends"}],"dependencies":[{"reason":"Requires Python 3.8 or newer.","package":"python","optional":false}],"imports":[{"note":"Main string class for in-memory byte arrays.","symbol":"Str","correct":"from stringzilla import Str"},{"note":"Memory-maps files for immutable, shared access.","symbol":"File","correct":"from stringzilla import File"},{"note":"Class for collections of strings, similar to `list[str]`, used in split operations.","symbol":"Strs","correct":"from stringzilla import Strs"}],"quickstart":{"code":"from stringzilla import Str, File\n\n# Using Str for in-memory strings\ntext_str = Str('hello stringzilla and the world!')\nprint(f\"Length of text_str: {len(text_str)}\")\nprint(f\"Does 'stringzilla' exist? {'stringzilla' in text_str}\")\nprint(f\"Index of 'world': {text_str.find('world')}\")\n\n# Example with File (requires a dummy file)\n# To run this, create a file named 'example.txt' with some content\nimport os\nwith open('example.txt', 'w') as f:\n    f.write('This is a test file for StringZilla.\\n')\n    f.write('It demonstrates memory-mapped file usage.')\n\nfile_str = File('example.txt')\nprint(f\"Length of file_str: {len(file_str)}\")\nprint(f\"Does 'test file' exist? {'test file' in file_str}\")\nprint(f\"Lines in file_str: {len(file_str.splitlines())}\")\n\n# Clean up the dummy file\nos.remove('example.txt')","lang":"python","description":"Demonstrates basic usage of `Str` for in-memory string operations and `File` for memory-mapped file handling, including length, substring checks, and finding substrings."},"warnings":[{"fix":"Ensure you install `stringzillas-cpus` or `stringzillas-cuda` if parallel processing is desired: `pip install stringzillas-cpus`.","message":"Users often install `stringzilla` (the base package) but might expect parallel performance. For multi-CPU backends, `stringzillas-cpus` is required, and for Nvidia GPU, `stringzillas-cuda` must be installed separately. These are distinct packages, and `stringzilla` only provides serial algorithms.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Consult `stringzilla.__capabilities__` to understand the detected hardware features. Performance will scale with available SIMD instructions.","message":"Optimal performance and certain advanced features (e.g., specific case-folding and case-insensitive search paths) heavily rely on modern CPU extensions like AVX-512, Arm Neon, or SVE. Running on older hardware or virtualized environments without these features may result in lower performance than expected or fallback to less optimized scalar implementations.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Pin `stringzillas` versions carefully in production environments and review release notes for breaking changes upon upgrade.","message":"The `stringzillas` (parallel) components are explicitly noted as being in 'beta' and 'under active development, and are likely to break in subsequent releases'. Users of these parallel backends should anticipate potential API changes or breaking changes between minor versions.","severity":"gotcha","affected_versions":"All `stringzillas` versions"},{"fix":"Always provide UTF-8 encoded strings to StringZilla functions for full Unicode compliance. Use `str.encode('utf-8')` if necessary.","message":"StringZilla aims for full Unicode 17.0 compliance, particularly for case-folding and case-insensitive searches. Ensure that your input data is correctly encoded in UTF-8 to leverage these features accurately. Incorrect encodings can lead to unexpected search results or behavior.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-10T00:00:00.000Z","next_check":"2026-07-09T00:00:00.000Z"}