{"id":7641,"library":"pyuca","title":"pyuca: Unicode Collation Algorithm","description":"pyuca is a pure-Python implementation of the Unicode Collation Algorithm (UCA), designed to sort non-English strings correctly by accounting for linguistic rules such as accents, contractions, and expansions. It implements multi-level comparison and passes UCA conformance tests for various Unicode versions, depending on the Python environment's `unicodedata` library. The library's current version is 1.2, released in September 2017. While functional and still used (e.g., in Fedora packages), it is not actively maintained and may be considered slightly obsolete by some.","status":"maintenance","version":"1.2","language":"en","source_language":"en","source_url":"http://github.com/jtauber/pyuca","tags":["unicode","collation","sorting","i18n","l10n"],"install":[{"cmd":"pip install pyuca","lang":"bash","label":"Install latest version"}],"dependencies":[],"imports":[{"note":"While specific Unicode version collators (e.g., Collator_8_0_0) can be imported, `from pyuca import Collator` is the recommended way to get the Collator appropriate to the Python version's `unicodedata` library.","wrong":"from pyuca.collator import Collator_X_Y_Z","symbol":"Collator","correct":"from pyuca import Collator"}],"quickstart":{"code":"from pyuca import Collator\n\ndef sort_strings_pyuca(strings):\n    # Initialize the Collator. It automatically selects the appropriate\n    # Unicode version based on your Python environment.\n    c = Collator()\n    \n    # Use the collator's sort_key method with Python's built-in sorted()\n    sorted_list = sorted(strings, key=c.sort_key)\n    return sorted_list\n\n# Example usage:\nwords = [\"cafe\", \"caff\", \"café\", \"cozy\", \"česky\"]\nprint(f\"Original list: {words}\")\nsorted_words = sort_strings_pyuca(words)\nprint(f\"Sorted list (pyuca): {sorted_words}\")\n\n# Demonstrating behavior with special characters\nassert sort_strings_pyuca([\"cafe\", \"caff\", \"café\"]) == [\"cafe\", \"café\", \"caff\"]\n","lang":"python","description":"This quickstart demonstrates how to initialize a `Collator` and use its `sort_key` method with Python's built-in `sorted()` function to achieve linguistically correct sorting of Unicode strings. The `Collator` automatically adapts to the Unicode version supported by your Python installation."},"warnings":[{"fix":"Evaluate if the existing functionality meets your needs. For projects requiring active development, newer Unicode standard support, or continuous maintenance, investigate other internationalization libraries (e.g., those based on ICU, if available for Python) as alternatives.","message":"The `pyuca` library has not seen active development since its last release in September 2017. While functional, new features or bug fixes are not expected, and it may be considered 'slightly obsolete' in favor of more actively maintained libraries, though specific direct Python alternatives are not extensively highlighted in search results.","severity":"deprecated","affected_versions":"1.2 and earlier"},{"fix":"For high-performance scenarios, benchmark `pyuca` against your requirements. Consider initializing the `Collator` once and reusing the instance rather than creating new ones repeatedly. If performance remains an issue, explore alternative libraries or optimize data processing workflows.","message":"As a pure-Python implementation, `pyuca` can introduce performance overhead for very large datasets or performance-critical applications when compared to libraries with C-extensions or more optimized collation engines.","severity":"gotcha","affected_versions":"All versions"},{"fix":"For intricate language-specific collation needs beyond the Default Unicode Collation Element Table (DUCET) provided by `pyuca`, you might need to use other tools or libraries. Python's built-in `locale` module can provide locale-specific sorting but is known to have thread-safety issues, especially in web server environments.","message":"While `pyuca` provides general Unicode collation, implementing highly specific language-tailoring rules (e.g., custom character ordering for a particular dialect) is not directly supported or straightforward. Customizing `allkeys.txt` is complex and error-prone.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure your Python environment is sufficiently up-to-date to access the desired Unicode standard version. `pyuca` v1.2 supports Unicode 8.0.0 on Python 3.5, 9.0.0 on 3.6, and 10.0.0 on Python 3.7 and later.","message":"The specific Unicode Collation Algorithm (UCA) version supported by `pyuca` dynamically depends on the `unicodedata` library version available in your Python environment. Older Python versions might not support the latest Unicode standards.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Use `pyuca.Collator` to generate culturally and linguistically correct sort keys:\n```python\nfrom pyuca import Collator\ncollator = Collator()\nwords = [\"résumé\", \"resume\", \"résiste\"]\nsorted_words = sorted(words, key=collator.sort_key)\n# sorted_words will be: ['resume', 'résumé', 'résiste']\n```","cause":"Python's default `sorted()` performs binary (lexicographical) sorting, which does not account for the complex linguistic rules of many languages (e.g., accents, ligatures, contractions, expansions).","error":"My non-English strings are not sorting alphabetically correctly with Python's default `sorted()` function."},{"fix":"Initialize the `Collator` object only once and reuse it across multiple sorting operations. If performance remains critical, consider profiling your code and exploring alternative collation libraries that might offer C-backed implementations or better optimization for your specific use case, if available.","cause":"`pyuca` is a pure-Python library, and the Unicode Collation Algorithm itself is computationally intensive. For very large collections of strings, the overhead can be noticeable.","error":"My application is slow when sorting many Unicode strings, even with `pyuca`."},{"fix":"Verify if the expected sorting behavior is a standard UCA rule or a highly localized tailoring. While `pyuca` is not designed for easy custom rule injection, for very specific needs, other internationalization libraries (e.g., `PyICU` for Pythong with ICU, though not a direct `pyuca` alternative) might offer more control over collation rules and tailoring options.","cause":"`pyuca` primarily implements the Default Unicode Collation Element Table (DUCET). Some languages have highly specific or tailored collation rules that deviate from the DUCET, which `pyuca` does not easily support for custom rules.","error":"The sorting order for specific characters in my language isn't quite right, even with `pyuca`."}]}