PyICU
PyICU is a Python extension that wraps the International Components for Unicode (ICU) C++ libraries. It provides robust, full-featured Unicode and globalization support for applications, handling tasks such as locale-aware text formatting, collation, character set conversions, and boundary analysis. The current version is 2.16.2, and the library maintains an active release cadence, often aligning with updates to the underlying ICU C++ library.
Warnings
- gotcha PyICU requires the underlying ICU C++ libraries to be installed on your system. `pip install pyicu` will only install the Python bindings; it does not install the native ICU libraries. Installation paths (e.g., `LD_LIBRARY_PATH`, `DYLD_LIBRARY_PATH`, `PATH`) or `pkg-config` setup might be necessary.
- gotcha PyICU's API closely mirrors the ICU4C C++ API, and there is no dedicated Python API documentation. Users must refer to the ICU4C C++ API documentation and translate patterns to Python.
- gotcha Handling strings with PyICU can be nuanced due to the difference between ICU's mutable `UnicodeString` and Python's immutable `str` (or `unicode` in Python 2). ICU APIs may modify `UnicodeString` objects in place, while PyICU often overloads functions to accept Python `str` and convert to/from `UnicodeString` implicitly, assuming UTF-8 for `str` objects.
- breaking Depending on the version of the ICU C++ library you are building against, specific C++ standard compiler flags are required. ICU versions 60-74 require `-std=c++11`, while ICU 75 and later require `-std=c++17`. Failure to include the correct flag can lead to build errors.
- deprecated The `icu-config` program for locating ICU libraries has been deprecated since ICU 63.1. `pkg-config` is now the recommended tool for this purpose.
Install
-
pip install pyicu -
brew install pkg-config icu4c export PATH="$(brew --prefix)/opt/icu4c/bin:$(brew --prefix)/opt/icu4c/sbin:$PATH" export PKG_CONFIG_PATH="$PKG_CONFIG_PATH:$(brew --prefix)/opt/icu4c/lib/pkgconfig" pip install pyicu -
apt-get install python3-icu
Imports
- Locale
from icu import Locale
- UnicodeString
from icu import UnicodeString
- BreakIterator
from icu import BreakIterator
- Collator
from icu import Collator
Quickstart
import icu
# Example 1: Locale-aware display name
locale = icu.Locale('pt_BR')
name = locale.getDisplayName()
print(f"Locale display name: {name}")
# Example 2: Text segmentation (grapheme clusters)
text_to_segment = "café emoji 👨👩👧👦"
breaker = icu.BreakIterator.createCharacterInstance(icu.Locale())
breaker.setText(text_to_segment)
grapheme_clusters = []
i = 0
for j in breaker:
grapheme_clusters.append(text_to_segment[i:j])
i = j
print(f"Grapheme clusters for '{text_to_segment}': {grapheme_clusters}")