Pyphen
Pyphen is a pure Python module designed for text hyphenation, leveraging existing Hunspell hyphenation dictionaries. It bundles a wide array of dictionaries sourced from the LibreOffice project. The library is currently at version 0.17.2 and maintains an active development status with regular releases, often including updates to dictionaries and broader Python version compatibility.
Warnings
- breaking Pyphen regularly drops support for older Python versions. For instance, v0.17.0 dropped Python 3.8, v0.15.0 dropped 3.7, and v0.12.0 dropped 3.6. Ensure your Python environment meets the `requires_python` specification, which is currently `>=3.9`.
- gotcha Initializing a `Pyphen` instance, particularly with large dictionaries, can be a time-consuming operation, especially on resource-constrained devices like a Raspberry Pi Zero where it might take tens of seconds. The delay occurs during dictionary parsing and setup.
- gotcha The correctness of hyphenation is dependent on the quality and rules of the underlying Hunspell dictionaries. Occasionally, certain words or language nuances might lead to unexpected hyphenation points or errors (e.g., specific issues reported for Polish or German words).
- deprecated Starting from version 0.15.0, Pyphen transitioned from `pkg_resources` to `importlib.resources` for resource management. While `pkg_resources` might still function in some older setups, it is officially deprecated in Python.
Install
-
pip install pyphen
Imports
- Pyphen
from pyphen import Pyphen
- language_fallback
import pyphen; pyphen.language_fallback(...)
- LANGUAGES
import pyphen; pyphen.LANGUAGES
Quickstart
import pyphen
# Instantiate Pyphen for a specific language (e.g., French)
dic = pyphen.Pyphen(lang='fr_FR')
# Check if a language is available
print(f"Is 'fr_FR' available? {'fr_FR' in pyphen.LANGUAGES}")
# Hyphenate a word, inserting hyphens
word = 'fromage'
hyphenated_word = dic.inserted(word)
print(f"'{word}' hyphenated: {hyphenated_word}") # Expected: 'fro-mage'
# Wrap a word to a certain width
long_word = 'autobandventieldopje'
wrapped_word = dic.wrap(long_word, 11)
print(f"'{long_word}' wrapped to 11 chars: {wrapped_word}") # Expected: ('autoband-', 'ventieldopje')
# Iterate over all possible hyphenation points
print(f"Hyphenation iterations for 'Amsterdam':")
for pair in dic.iterate('Amsterdam'):
print(pair)