Phonemizer
Phonemizer is a simple Python library for converting text into phonemes for multiple languages. It acts as a wrapper around various external phonemization backends like eSpeak, Festival, and Flite. The library is actively maintained with regular releases, typically addressing bug fixes, performance improvements, and adding new features.
Warnings
- breaking Phonemizer v3.3.0 and later require Python >= 3.8. Previous versions supported Python >= 3.6. Attempting to install or run on older Python versions will result in dependency resolution or runtime errors.
- gotcha Phonemizer relies on external phonemization engines (e.g., eSpeak, Festival, Flite) installed at the system level. It will not function without at least one backend available. eSpeak is the recommended and default backend.
- breaking In v3.0, the default phonemization backend changed from `festival` to `espeak`. If your application implicitly relied on `festival` and does not explicitly specify a backend, its output may change significantly or break if `espeak` is not available.
- breaking Prior to v3.0, empty lines in input texts were automatically removed from the output. Since v3.0, empty lines are preserved by default, potentially leading to different list lengths or unexpected empty strings in the phonemized output.
Install
-
pip install phonemizer
Imports
- phonemize
from phonemizer import phonemize
Quickstart
from phonemizer import phonemize
# NOTE: This library requires an external phonemization backend (e.g., eSpeak)
# to be installed on your system. For Debian/Ubuntu:
# sudo apt-get install espeak
# For macOS:
# brew install espeak
texts = [
"Hello, world!",
"This is a test."
]
# Using the default 'espeak' backend (since v3.0)
# You can specify language (e.g., 'en-us') and backend ('espeak') explicitly.
phonemes = phonemize(texts, language='en-us', backend='espeak')
print("Original texts:", texts)
print("Phonemes:", phonemes)
# Example with a different separator
phonemes_with_separator = phonemize(
texts,
language='en-us',
backend='espeak',
separator='_'
)
print("Phonemes with custom separator:", phonemes_with_separator)