langcodes
langcodes is a Python library (current version 3.5.1) that provides tools for parsing, manipulating, and comparing IETF language tags (BCP 47), which are used to identify human languages. It offers robust support for language identification and normalization, implementing standards like ISO 639 and Unicode CLDR. The library maintains an active release cadence, with minor versions released periodically.
Warnings
- deprecated The `best_match` function and other `_score` methods (e.g., `Language.match_score`) are deprecated. They rely on older CLDR matching tables, which are less accurate than the distance-based comparisons introduced later.
- breaking Python 3.8 is no longer supported. Attempting to use `langcodes` on Python 3.8 will likely result in installation failures or unexpected behavior.
- gotcha Functions requiring language names (e.g., `Language.display_name()`) or population data now rely on the optional `language-data` package, which is *not* installed by default as of v3.5.1. Attempting to use these features without the optional package will raise an `ImportError`.
- gotcha The `Language.__hash__` method was reworked in v3.4.1 to correctly account for all language variations. If you relied on specific hash values or used `Language` objects in sets/dictionaries and persisted/reloaded them across v3.4.1 without re-hashing, you might encounter unexpected behavior due to changed hash values.
- deprecated The `region` parameter, dictionary key, and attribute for `Language` objects were renamed to `territory` to align with CLDR and IANA standards. While some backward compatibility with deprecation warnings exists, relying on `region` is discouraged.
Install
-
pip install langcodes
Imports
- Language
from langcodes import Language
- get
from langcodes import get
- best_match
from langcodes import best_match
Quickstart
from langcodes import Language, standardize_tag, closest_match
# Parse a language tag
english_us = Language.get('en-US')
print(f"Parsed 'en-US': language={english_us.language}, script={english_us.script}, territory={english_us.territory}")
# Normalize a language tag
normalized_tag = standardize_tag('zh-CN')
print(f"Normalized 'zh-CN': {normalized_tag}")
# Compare languages using distance (lower is better match)
french = Language.get('fr')
canadian_french = Language.get('fr-CA')
print(f"Distance between 'fr' and 'fr-CA': {french.distance(canadian_french)}")
# Find the closest match from a list of supported languages
desired = 'en-GB'
supported = ['en-US', 'en-AU', 'fr-CA']
closest = closest_match(desired, supported)
print(f"Closest match for '{desired}' in {supported}: {closest}")
# Get display names (requires 'langcodes[data]' to be installed)
try:
spanish_name_in_english = Language.get('es').display_name('en')
print(f"Name of 'es' in English: {spanish_name_in_english}")
except ImportError:
print("Install 'langcodes[data]' (e.g., pip install langcodes[data]) for language names and statistics.")