gruut

raw JSON →
2.4.0 verified Mon Apr 27 auth: no python

A tokenizer, text cleaner, and phonemizer for many human languages. Current version: 2.4.0. Release cadence: irregular, with major version bumps introducing API and CLI changes.

pip install gruut
error ModuleNotFoundError: No module named 'gruut.lang'
cause Missing language data package for the requested language.
fix
Install the corresponding gruut-lang-<lang> package, e.g., pip install gruut-lang-en.
error AttributeError: module 'gruut' has no attribute 'text_to_phonemes'
cause Using old API from gruut v1.x.
fix
Use gruut.sentences() instead. See quickstart.
error ValueError: Unknown language: en
cause Language code not recognized because data not installed.
fix
Install the language data package for 'en' (e.g., gruut-lang-en).
breaking API change in v2.0.0: The old inline pronunciation and <number>_<format> syntax were removed. Use SSML instead.
fix Migrate to SSML tags like <say-as interpret-as='date'> for date/time expansion.
breaking CLI change in v2.0.0: Use 'gruut.sentences' subcommand instead of 'gruut' directly.
fix Run 'gruut sentences' instead of 'gruut'.
breaking English language data moved to a separate package in v2.1.0. Must install gruut-lang-en explicitly.
fix Run 'pip install gruut-lang-en' or install gruut with the langs extra: 'pip install gruut[langs]'.
deprecated Python 3.6 support is no longer guaranteed; latest releases require Python >=3.6 but newer dependencies may drop it.
fix Use Python 3.7 or newer.
gotcha The 'gruut' package does not include any language data by default. You must install language-specific packages.
fix Install gruut-lang-<lang> for your language (e.g., gruut-lang-en).
gotcha When using stdin with --ssml in CLI v2.1.0+, input is assumed to be one SSML document per line unless --stdin-format lines is set.
fix Use --stdin-format lines for line-by-line input.
pip install gruut[langs]

Tokenize and phonemize a sentence with gruut.

import gruut

text = "Hello world."
# Use a language code (e.g., 'en-us')
for sentence in gruut.sentences(text, lang='en-us'):
    for word in sentence:
        if word.phonemes:
            print(f"{word.text}: {' '.join(word.phonemes)}")