g2pM - Neural Grapheme-to-Phoneme for Mandarin Chinese
raw JSON → 0.1.2.5 verified Sat May 09 auth: no python maintenance
g2pM is a neural grapheme-to-phoneme conversion package for Mandarin Chinese, leveraging a transformer-based model to convert Chinese characters (pinyin) to phonetic symbols (bopomofo/zhuyin). Current version 0.1.2.5, released infrequently; last update in 2021.
pip install g2pm Common errors
error ModuleNotFoundError: No module named 'g2pm' ↓
cause Package not installed or misspelled; sometimes confused with 'g2p' or 'g2p-en'.
fix
Run: pip install g2pm
error AttributeError: module 'g2pm' has no attribute 'G2pM' ↓
cause Incorrect import pattern (e.g., 'import g2pm' then 'g2pm.G2pM()' but G2pM is not a direct attribute of the module).
fix
Use: from g2pm import G2pM
error FileNotFoundError: [Errno 2] No such file or directory: '.../g2pm/model/model.pb' ↓
cause Model file not downloaded or corrupted because download interrupted.
fix
Delete the cached model directory (usually ~/.g2pm/) and re-initialize: from g2pm import G2pM; g2p = G2pM()
Warnings
gotcha Input must be pinyin with tone numbers (e.g., 'ni3'), not Chinese characters or pinyin without tones. The model does not handle characters. ↓
fix Preprocess your text to extract pinyin syllables with tone numbers before using g2pM.
gotcha The package downloads a pre-trained model on first use (~75 MB). Ensure network connectivity or manually cache the model. ↓
fix Run g2p = G2pM() once to trigger download; subsequent calls use cached model.
deprecated The package has not been updated since 2021 and may not work with newer Python versions (e.g., 3.10+). Known issues with tensorflow or torch dependency mismatches. ↓
fix Consider using an alternative like pypinyin or jieba for Chinese text processing.
Imports
- G2pM wrong
import g2pmcorrectfrom g2pm import G2pM
Quickstart
from g2pm import G2pM
g2p = G2pM()
# Convert pinyin with tone numbers to bopomofo
result = g2p('wo3')
print(result) # Output: ㄨㄛˇ
# For a list of pinyin strings
results = g2p(['ni3', 'hao3'])
print(results) # Output: ['ㄋㄧˇ', 'ㄏㄠˇ']