PyICU

2.16.2 · active · verified Tue Apr 14

PyICU is a Python extension that wraps the International Components for Unicode (ICU) C++ libraries. It provides robust, full-featured Unicode and globalization support for applications, handling tasks such as locale-aware text formatting, collation, character set conversions, and boundary analysis. The current version is 2.16.2, and the library maintains an active release cadence, often aligning with updates to the underlying ICU C++ library.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to create a `Locale` object to get a locale's display name and how to use `BreakIterator` for locale-aware text segmentation into grapheme clusters. PyICU exposes much of the underlying ICU C++ API directly.

import icu

# Example 1: Locale-aware display name
locale = icu.Locale('pt_BR')
name = locale.getDisplayName()
print(f"Locale display name: {name}")

# Example 2: Text segmentation (grapheme clusters)
text_to_segment = "café emoji 👨‍👩‍👧‍👦"
breaker = icu.BreakIterator.createCharacterInstance(icu.Locale())
breaker.setText(text_to_segment)

grapheme_clusters = []
i = 0
for j in breaker:
    grapheme_clusters.append(text_to_segment[i:j])
    i = j

print(f"Grapheme clusters for '{text_to_segment}': {grapheme_clusters}")

view raw JSON →