langcodes
langcodes is a Python library (current version 3.5.1) that provides tools for parsing, manipulating, and comparing IETF language tags (BCP 47), which are used to identify human languages. It offers robust support for language identification and normalization, implementing standards like ISO 639 and Unicode CLDR. The library maintains an active release cadence, with minor versions released periodically.
Common errors
-
ModuleNotFoundError: No module named 'language_data'
cause Certain functionalities of `langcodes`, particularly those involving language names and detailed data, require the optional `language_data` package, which is not installed by default with `langcodes` itself.fixInstall the `language_data` package separately, or install `langcodes` with its 'data' extra: `pip install language_data` or `pip install langcodes[data]` -
langcodes.tag_parser.LanguageTagError: This script subtag, 'latn', is out of place. Expected variant, extension, or end of string.
cause This error occurs when a language tag string is malformed or does not adhere to the BCP 47 standard's expected order of subtags, such as placing a script subtag after a territory subtag.fixEnsure the language tag follows BCP 47 conventions, specifically the order of subtags (language-script-territory-variant). For example, 'spa-latn-mx' is valid, while 'spa-mx-latn' is not. The correct tag would be `es-Latn-MX` or simply `es-MX`. -
LookupError: 'un' is not a known language code, and has no alpha3 code.
cause This error arises when attempting to retrieve an ISO 639-2 (alpha3) code for a language tag that is either unknown, invalid, or for which `langcodes` does not have a corresponding 3-letter code mapping.fixVerify that the language code being queried is a valid IANA-registered language subtag. If it's a private-use tag (e.g., 'x-private'), it won't have a standard alpha3 code. Ensure the code exists and has an alpha3 mapping in the `langcodes` database. For example, `Language.get('en').to_alpha3()` works for 'en'. -
AttributeError: 'Language' object has no attribute 'region'
cause In `langcodes` version 2.0 and later, the attribute for a region code was renamed from `region` to `territory` to align with Unicode CLDR terminology. Code written for older versions will encounter this error.fixUpdate your code to use the `territory` attribute instead of `region`. For example, change `lang.region` to `lang.territory`.
Warnings
- deprecated The `best_match` function and other `_score` methods (e.g., `Language.match_score`) are deprecated. They rely on older CLDR matching tables, which are less accurate than the distance-based comparisons introduced later.
- breaking Python 3.8 is no longer supported. Attempting to use `langcodes` on Python 3.8 will likely result in installation failures or unexpected behavior.
- gotcha Functions requiring language names (e.g., `Language.display_name()`) or population data now rely on the optional `language-data` package, which is *not* installed by default as of v3.5.1. Attempting to use these features without the optional package will raise an `ImportError`.
- gotcha The `Language.__hash__` method was reworked in v3.4.1 to correctly account for all language variations. If you relied on specific hash values or used `Language` objects in sets/dictionaries and persisted/reloaded them across v3.4.1 without re-hashing, you might encounter unexpected behavior due to changed hash values.
- deprecated The `region` parameter, dictionary key, and attribute for `Language` objects were renamed to `territory` to align with CLDR and IANA standards. While some backward compatibility with deprecation warnings exists, relying on `region` is discouraged.
Install
-
pip install langcodes
Imports
- Language
from langcodes import Language
- get
from langcodes import get
- best_match
from langcodes import best_match
from langcodes import best_match
Quickstart
from langcodes import Language, standardize_tag, closest_match
# Parse a language tag
english_us = Language.get('en-US')
print(f"Parsed 'en-US': language={english_us.language}, script={english_us.script}, territory={english_us.territory}")
# Normalize a language tag
normalized_tag = standardize_tag('zh-CN')
print(f"Normalized 'zh-CN': {normalized_tag}")
# Compare languages using distance (lower is better match)
french = Language.get('fr')
canadian_french = Language.get('fr-CA')
print(f"Distance between 'fr' and 'fr-CA': {french.distance(canadian_french)}")
# Find the closest match from a list of supported languages
desired = 'en-GB'
supported = ['en-US', 'en-AU', 'fr-CA']
closest = closest_match(desired, supported)
print(f"Closest match for '{desired}' in {supported}: {closest}")
# Get display names (requires 'langcodes[data]' to be installed)
try:
spanish_name_in_english = Language.get('es').display_name('en')
print(f"Name of 'es' in English: {spanish_name_in_english}")
except ImportError:
print("Install 'langcodes[data]' (e.g., pip install langcodes[data]) for language names and statistics.")