Misaki G2P Engine
Misaki is a Grapheme-to-Phoneme (G2P) engine for Text-to-Speech (TTS) applications, converting written text into phonemes. It primarily supports English with dictionary-based lookups and offers configurable fallbacks, including rule-based systems like `espeak-ng` and optional neural network models. Designed to be lightweight and efficient, Misaki is often integrated into larger TTS systems like Kokoro. The current version is 0.9.4, and the project shows active development with ongoing maintenance and issue resolution on GitHub.
Warnings
- breaking Misaki version 0.9.4 and higher currently do not support Python 3.13, and there have been reports of installation issues with other Python versions. Best compatibility is typically found with Python 3.10 or 3.11.
- gotcha The `espeak-ng` library, commonly used as a fallback for out-of-dictionary words in Misaki, is an external system dependency. It must be installed separately from the Python package, and its installation method varies by operating system (e.g., `apt-get install espeak-ng` on Debian/Ubuntu). Failure to install it will result in words not found in Misaki's internal dictionaries being spelled out letter-by-letter or marked as unknown.
- gotcha The phoneme set used by Misaki for English is specifically designed for optimal performance in neural networks and may not strictly adhere to traditional linguistic IPA representations. The author notes that some symbols might be 'butchered or reappropriated'. This can lead to unexpected phoneme mappings for linguists or users expecting strict IPA compliance.
- gotcha Enabling transformer-based POS tagging (`trf=True`) or installing `misaki` as a dependency for other libraries like KittenTTS can pull in heavy dependencies such as `torch` and NVIDIA CUDA packages, potentially adding several gigabytes to the installation size, even if a GPU is not utilized or if `trf=False` is ultimately used for Misaki itself.
- gotcha Misaki's current implementation may have limitations in non-POS-based homograph disambiguation (e.g., distinguishing 'graph axes' from 'throwing axes'). While it handles some POS-based disambiguation, more complex contextual disambiguation remains a 'TODO'.
Install
-
pip install "misaki[en]" -
sudo apt-get install espeak-ng
Imports
- G2P
from misaki import en g2p_engine = en.G2P(...)
Quickstart
from misaki import en
# Initialize G2P for American English, no transformer, no external fallback
g2p = en.G2P(trf=False, british=False, fallback=None)
text = "Misaki is a G2P engine designed for Text-to-Speech models."
phonemes, tokens = g2p(text)
print(f"Text: {text}")
print(f"Phonemes: {phonemes}")
# Example with espeak-ng fallback (requires espeak-ng installed on system)
# from misaki import espeak
# fallback_espeak = espeak.EspeakFallback(british=False)
# g2p_with_fallback = en.G2P(trf=False, british=False, fallback=fallback_espeak)
# text_ood = "Now outofdictionary words are handled by espeak."
# phonemes_ood, _ = g2p_with_fallback(text_ood)
# print(f"Text (OOD): {text_ood}")
# print(f"Phonemes (OOD): {phonemes_ood}")