Indonesian G2P (Grapheme-to-Phoneme)
raw JSON → 0.4.2 verified Sat May 09 auth: no python
A library for converting Indonesian text to phoneme sequences using a hybrid approach: rule-based conversion enhanced with an ONNX-based neural model. Current version 0.4.2, with active development on GitHub.
pip install g2p-id-py Common errors
error ImportError: cannot import name 'IndonesianG2P' from 'g2p_id' ↓
cause Package not installed or wrong import path (e.g., using 'from g2p_id import g2p_id' or similar).
fix
Install: pip install g2p-id-py. Then import: from g2p_id import IndonesianG2P
error LookupError: Resource punkt not found. Please use the NLTK Downloader to obtain the resource: ↓
cause NLTK data (punkt tokenizer) is missing, required by TweetTokenizer.
fix
Run: import nltk; nltk.download('punkt')
error TypeError: __init__() got an unexpected keyword argument 'model_path' ↓
cause Older versions of IndonesianG2P accepted 'model_path' parameter; it was removed/renamed in later versions.
fix
Check version: pip show g2p-id-py. Use default initialization: IndonesianG2P() or see docs for current constructor.
Warnings
breaking In v0.4.2, glottal stop is inserted between consecutive vowels (e.g., 'hai' -> ['h', 'a', 'ʔ', 'i']). This changes output compared to earlier versions. ↓
fix If you rely on old behavior, pin to <0.4.2 or adjust your phoneme post-processing.
breaking In v0.4.2, all 'k' graphemes map to 'k' phoneme. Previously, some 'k's were mapped to 'ʔ'. This may affect downstream tasks like ASR. ↓
fix Check your phoneme expectations; update any mappings that assumed 'ʔ' for 'k'.
gotcha The package depends on NLTK's TweetTokenizer. As of v0.3.5, NLTK version is pinned due to backward incompatibility with >=3.8.1. If you have conflicting NLTK versions, it may break. ↓
fix Use the pinned version: pip install 'nltk==3.8' or see issue #16.
gotcha The ONNX model file is loaded with ONNX Runtime. If you need to serialize the IndonesianG2P object (e.g., with pickle), you must use v0.3.7+ where ONNX InferenceSession is wrapped. ↓
fix Upgrade to >=0.3.7, or handle serialization manually.
Imports
- IndonesianG2P
from g2p_id import IndonesianG2P
Quickstart
from g2p_id import IndonesianG2P
g2p = IndonesianG2P()
text = "Halo, apa kabar?"
phonemes = g2p.g2p(text)
print(phonemes)
# Output: ['h', 'a', 'l', 'o', 'ʔ', 'a', 'p', 'a', 'k', 'a', 'b', 'a', 'r']