CMU Pronouncing Dictionary Python Wrapper
CMUdict (cmudict) is a Python wrapper package for the CMU Pronouncing Dictionary data files, providing access to over 134,000 English words and their ARPAbet pronunciations. It exposes the data with minimal assumptions on its usage. The library is actively maintained with frequent patch releases, often related to dependency updates or minor fixes, and occasional minor version bumps for features like type hints.
Warnings
- gotcha The methods `cmudict.dict()` and `cmudict.entries()` return data in different structures and are often confused. `cmudict.dict()` maps unique words to a list of their pronunciations, while `cmudict.entries()` returns a list of (word, pronunciation) tuples, meaning words with multiple pronunciations will have multiple entries in the list.
- gotcha Words not found in the CMU Pronouncing Dictionary will result in `None` when using `cmudict.dict().get(word)` or an empty list when filtering `cmudict.entries()`. The dictionary is comprehensive but does not include all possible English words or numbers (which should be spelled out).
- breaking Versions `1.0.7`, `1.0.8`, and `1.0.9` were yanked from PyPI due to a 'broken deployment process'. Attempting to install these specific older versions will likely fail or lead to unexpected behavior.
- deprecated As of v1.1.1, internal typing hints were updated to use built-in types (`list`, `tuple`) instead of deprecated `typing.List` and `typing.Tuple`. While not directly breaking for most users, this reflects a move towards more modern Python typing conventions.
Install
-
pip install cmudict
Imports
- cmudict
import cmudict
- dict
cmudict.dict()
Quickstart
import cmudict
# Get the full dictionary as a mapping from word to a list of pronunciations
pron_dict = cmudict.dict()
word = "hello"
pronunciations = pron_dict.get(word)
if pronunciations:
print(f"Pronunciations for '{word}': {pronunciations}")
# Example: Accessing the first pronunciation and its phonemes
first_pronunciation_phonemes = pronunciations[0]
print(f"First pronunciation phonemes: {first_pronunciation_phonemes}")
else:
print(f"'{word}' not found in CMUdict.")
# To get all entries as (word, pronunciation) tuples (e.g., for iteration)
all_entries = cmudict.entries()
# print(f"Total entries (including variants): {len(all_entries)}")
# Example of getting pronunciations via entries() (less direct for single word lookup)
# target_word = "example"
# example_pronunciations = [p for w, p in all_entries if w == target_word]
# print(f"Pronunciations for '{target_word}' (from entries): {example_pronunciations}")