g2pM - Neural Grapheme-to-Phoneme for Mandarin Chinese

raw JSON →
0.1.2.5 verified Sat May 09 auth: no python maintenance

g2pM is a neural grapheme-to-phoneme conversion package for Mandarin Chinese, leveraging a transformer-based model to convert Chinese characters (pinyin) to phonetic symbols (bopomofo/zhuyin). Current version 0.1.2.5, released infrequently; last update in 2021.

pip install g2pm
error ModuleNotFoundError: No module named 'g2pm'
cause Package not installed or misspelled; sometimes confused with 'g2p' or 'g2p-en'.
fix
Run: pip install g2pm
error AttributeError: module 'g2pm' has no attribute 'G2pM'
cause Incorrect import pattern (e.g., 'import g2pm' then 'g2pm.G2pM()' but G2pM is not a direct attribute of the module).
fix
Use: from g2pm import G2pM
error FileNotFoundError: [Errno 2] No such file or directory: '.../g2pm/model/model.pb'
cause Model file not downloaded or corrupted because download interrupted.
fix
Delete the cached model directory (usually ~/.g2pm/) and re-initialize: from g2pm import G2pM; g2p = G2pM()
gotcha Input must be pinyin with tone numbers (e.g., 'ni3'), not Chinese characters or pinyin without tones. The model does not handle characters.
fix Preprocess your text to extract pinyin syllables with tone numbers before using g2pM.
gotcha The package downloads a pre-trained model on first use (~75 MB). Ensure network connectivity or manually cache the model.
fix Run g2p = G2pM() once to trigger download; subsequent calls use cached model.
deprecated The package has not been updated since 2021 and may not work with newer Python versions (e.g., 3.10+). Known issues with tensorflow or torch dependency mismatches.
fix Consider using an alternative like pypinyin or jieba for Chinese text processing.

Initialize G2pM and convert pinyin to bopomofo.

from g2pm import G2pM

g2p = G2pM()
# Convert pinyin with tone numbers to bopomofo
result = g2p('wo3')
print(result)  # Output: ㄨㄛˇ

# For a list of pinyin strings
results = g2p(['ni3', 'hao3'])
print(results)  # Output: ['ㄋㄧˇ', 'ㄏㄠˇ']