{"id":4028,"library":"grapheme","title":"Grapheme Unicode Helpers","description":"The `grapheme` library (current version 0.10.0) provides helpers for Unicode grapheme-aware string handling in Python. It enables accurate counting, slicing, and manipulation of strings based on user-perceived characters (graphemes) rather than Unicode code points. The library is actively maintained, supporting recent Unicode standards, and typically releases new versions a few times a year.","status":"active","version":"0.10.0","language":"en","source_language":"en","source_url":"https://github.com/timendum/grapheme","tags":["unicode","grapheme","text processing","string manipulation"],"install":[{"cmd":"pip install grapheme","lang":"bash","label":"Install latest version"}],"dependencies":[{"reason":"Required Python version as specified by PyPI metadata.","package":"python","optional":false}],"imports":[{"note":"Python's built-in `len()` counts Unicode code points, not user-perceived graphemes. For strings with combining characters or emojis, `grapheme.length()` provides the correct visual count.","wrong":"len('string')","symbol":"length","correct":"import grapheme\ngrapheme.length('string')"},{"note":"Native Python string slicing operates on code points and can break grapheme clusters. `grapheme.slice()` ensures slicing respects grapheme boundaries.","wrong":"'string'[0:5]","symbol":"slice","correct":"import grapheme\ngrapheme.slice('string', start=0, end=5)"},{"note":"`grapheme.graphemes()` returns an iterator over the grapheme clusters in a string.","symbol":"graphemes","correct":"import grapheme\nlist(grapheme.graphemes('string'))"}],"quickstart":{"code":"import grapheme\n\nrainbow_flag = \"🏳️‍🌈\" # An emoji represented by multiple code points\n\n# Correctly count graphemes\nvisual_length = grapheme.length(rainbow_flag)\nprint(f\"Visual length of '{rainbow_flag}': {visual_length}\") # Expected: 1\n\n# Incorrectly count code points with built-in len()\ncodepoint_length = len(rainbow_flag)\nprint(f\"Code point length of '{rainbow_flag}': {codepoint_length}\") # Expected: 4\n\n# Safely slice by graphemes\ntext = \"tamil நி (ni)\"\nsliced_by_grapheme = grapheme.slice(text, end=7)\nprint(f\"Grapheme-sliced: '{sliced_by_grapheme}'\") # Expected: 'tamil நி'\n\n# Unsafely slice by code points\nunsafely_sliced = text[:7]\nprint(f\"Codepoint-sliced: '{unsafely_sliced}'\") # Expected: 'tamil ந'\n","lang":"python","description":"This example demonstrates how to use `grapheme.length()` and `grapheme.slice()` to correctly handle user-perceived characters (graphemes) compared to Python's default string operations, which work on Unicode code points."},"warnings":[{"fix":"Upgrade to Python 3.7 or newer, or pin `grapheme<0.7.0`.","message":"Python 3.6 support was dropped with version `0.7.0`. Users on Python 3.6 should pin their `grapheme` dependency to `<0.7.0`.","severity":"breaking","affected_versions":">=0.7.0"},{"fix":"Upgrade to Python 3.10 or newer, or pin `grapheme<0.9.0`.","message":"The current version `0.10.0` (and `0.9.0` onwards) explicitly requires Python >=3.10. If you are using an older Python version (e.g., 3.8, 3.9), you will need to upgrade your Python environment or use an older `grapheme` version.","severity":"breaking","affected_versions":">=0.9.0"},{"fix":"Evaluate performance needs; for extremely long strings where approximate length or codepoint-based operations are acceptable, native string methods might be faster.","message":"The library's functions, by nature of grapheme cluster calculation, have a linear time complexity (`O(n)`) relative to string length. For performance-critical applications involving very long strings, consider the trade-off between correctness and speed.","severity":"gotcha","affected_versions":"all"},{"fix":"Use positive indices for `start` and `end` arguments when using `grapheme.slice()`.","message":"Negative indexing (e.g., `grapheme.slice(text, start=-1)`) is currently not supported for `grapheme.slice()` and will raise a `NotImplementedError`.","severity":"gotcha","affected_versions":"all"},{"fix":"Always use `grapheme.contains(main_string, sub_string)` for grapheme-aware substring checks.","message":"The `in` operator in Python performs substring checks based on Unicode code points. `grapheme.contains()` provides a grapheme-aware substring check, which may yield different results when dealing with multi-codepoint graphemes (e.g., emojis or combining characters).","severity":"gotcha","affected_versions":"all"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}