Korean Grapheme-to-Phoneme (g2pkk)
g2pkk is a Grapheme-to-Phoneme (G2P) conversion module specifically designed for Korean text, aiming for cross-platform compatibility. It is currently at version 0.1.2 and appears to have an active, though not rapid, release cadence given its early development stage.
Common errors
-
LookupError: Resource 'punkt' not found. Please use the NLTK Downloader to obtain the resource:
cause The NLTK 'punkt' tokenizer data, required by g2pkk, has not been downloaded.fixExecute `import nltk; nltk.download('punkt')` in your Python environment. This typically only needs to be done once. -
ModuleNotFoundError: No module named 'g2pkk'
cause The `g2pkk` package is not installed in your current Python environment.fixInstall the package using pip: `pip install g2pkk`. -
AttributeError: 'G2pkk' object has no attribute 'some_method'
cause Attempting to call a method that does not exist or has been removed/renamed in the `G2pkk` class.fixRefer to the official g2pkk documentation or GitHub README for the correct API usage. If you've updated the library, check for breaking changes in the release notes.
Warnings
- gotcha The `nltk` library, a dependency of g2pkk, requires the 'punkt' tokenizer resource to be downloaded separately. Failure to do so will result in a `LookupError`.
- gotcha g2pkk is in early development (version 0.1.x), which means its API might undergo non-backward-compatible changes in future minor or patch releases as it approaches a stable 1.0 version.
- gotcha Input text must be primarily Korean. While it handles numbers and some English, its core functionality and accuracy are designed for Korean grapheme-to-phoneme conversion.
Install
-
pip install g2pkk
Imports
- G2pkk
from g2pkk import G2pkk
Quickstart
import nltk
from g2pkk import G2pkk
# Important: Download NLTK 'punkt' resource if not already done
try:
nltk.data.find('tokenizers/punkt')
except nltk.downloader.DownloadError:
print("Downloading NLTK 'punkt' resource...")
nltk.download('punkt')
# Initialize the G2P converter
g2p = G2pkk()
# Convert Korean text to its phoneme representation
text = "안녕하세요 g2p 입니다. 반갑습니다. 123."
result = g2p(text)
print(f"Original: {text}")
print(f"Phonetic: {result}")
# Example with specific romanization
text_roman = g2p("한국어")
print(f"'한국어' phonetic: {text_roman}")