Chinese Pinyin Conversion (pypinyin)
pypinyin is a Python library for converting Chinese characters to Pinyin. It intelligently matches the most fitting Pinyin based on phrase occurrences, supports heteronyms (multi-pronunciation characters), simplified/traditional Chinese, Zhuyin, and various Pinyin styles (e.g., tone conventions). The library is actively maintained, with version 0.55.0 released recently, demonstrating a consistent release cadence.
Warnings
- gotcha By default, pypinyin results do not indicate neutral tones and use 'v' for 'ü'.
- gotcha Standard Pinyin rules state that 'y', 'w', and 'yu' are not syllable initials. By default, pypinyin adheres to this, which might lead to unexpected empty strings for `Style.INITIALS`.
- gotcha When converting text containing characters without Pinyin (e.g., symbols, non-Chinese characters), the default behavior is to return them as-is.
- breaking In Python 3.12, older versions of pypinyin (prior to 0.52.0) experienced significant performance degradation during import, especially in debugging environments or with `pytest --cov`.
- gotcha When bundling applications with PyInstaller, older versions of pypinyin might have had issues locating internal data files, leading to `no such file or dictionary: pinyin_dict.json` errors.
Install
-
pip install pypinyin
Imports
- pinyin
from pypinyin import pinyin
- lazy_pinyin
from pypinyin import lazy_pinyin
- Style
from pypinyin import Style
Quickstart
from pypinyin import pinyin, lazy_pinyin, Style
chinese_text = "你好,世界!"
# Convert to Pinyin with tone marks (default style)
pinyin_result_toned = pinyin(chinese_text)
print(f"Toned Pinyin: {pinyin_result_toned}")
# Convert to Pinyin without tone marks (lazy_pinyin)
pinyin_result_lazy = lazy_pinyin(chinese_text)
print(f"Lazy Pinyin: {pinyin_result_lazy}")
# Convert to Pinyin using first letter style
pinyin_result_first_letter = pinyin(chinese_text, style=Style.FIRST_LETTER)
print(f"First Letter Pinyin: {pinyin_result_first_letter}")
# Handle heteronyms (multi-pronunciation characters)
heteronym_text = "中心"
pinyin_heteronym = pinyin(heteronym_text, heteronym=True)
print(f"Heteronym Pinyin for '中心': {pinyin_heteronym}")