Grapheme Unicode Helpers

0.10.0 · active · verified Sat Apr 11

The `grapheme` library (current version 0.10.0) provides helpers for Unicode grapheme-aware string handling in Python. It enables accurate counting, slicing, and manipulation of strings based on user-perceived characters (graphemes) rather than Unicode code points. The library is actively maintained, supporting recent Unicode standards, and typically releases new versions a few times a year.

Warnings

Install

Imports

Quickstart

This example demonstrates how to use `grapheme.length()` and `grapheme.slice()` to correctly handle user-perceived characters (graphemes) compared to Python's default string operations, which work on Unicode code points.

import grapheme

rainbow_flag = "🏳️‍🌈" # An emoji represented by multiple code points

# Correctly count graphemes
visual_length = grapheme.length(rainbow_flag)
print(f"Visual length of '{rainbow_flag}': {visual_length}") # Expected: 1

# Incorrectly count code points with built-in len()
codepoint_length = len(rainbow_flag)
print(f"Code point length of '{rainbow_flag}': {codepoint_length}") # Expected: 4

# Safely slice by graphemes
text = "tamil நி (ni)"
sliced_by_grapheme = grapheme.slice(text, end=7)
print(f"Grapheme-sliced: '{sliced_by_grapheme}'") # Expected: 'tamil நி'

# Unsafely slice by code points
unsafely_sliced = text[:7]
print(f"Codepoint-sliced: '{unsafely_sliced}'") # Expected: 'tamil ந'

view raw JSON →