IDNA — Internationalized Domain Names in Applications

raw JSON →
3.11 verified Tue May 12 auth: no python install: verified quickstart: stale

idna implements the Internationalized Domain Names in Applications (IDNA 2008, RFC 5891) protocol and Unicode IDNA Compatibility Processing (UTS #46) for Python. It is the modern replacement for the built-in encodings.idna module, which only supports the obsolete IDNA 2003 standard. Current version is 3.11 (Python 3.8+); releases are irregular but active, with security patches and Unicode data updates driving most releases.

pip install idna
error UnicodeError: encoding with 'idna' codec failed (UnicodeError: label empty or too long)
cause This error often occurs when a hostname or part of a URL being processed for Internationalized Domain Names (IDNA) violates DNS specifications, such as a label exceeding the maximum length of 63 bytes or containing characters not allowed in domain names.
fix
Ensure that the string passed to the IDNA encoding function represents a valid hostname, not an entire URL, and that each domain label (segment between dots) is no longer than 63 bytes. For URLs, parse out the hostname component before applying IDNA encoding.
error LookupError: unknown encoding: idna
cause This error typically indicates that the 'idna' codec is not properly registered or available in the Python environment, which can happen with embedded Python distributions or when `encodings.idna` isn't implicitly loaded, particularly in multi-threaded applications.
fix
To ensure the idna codec is loaded, explicitly import encodings.idna (or import idna if using the external library) at the application's startup, or perform a no-op encoding/decoding operation like b''.decode('idna') or u''.encode('idna') early in your code.
error idna.IDNAError
cause This is the base exception for various issues where a domain name string violates the IDNA 2008 or Unicode IDNA Compatibility Processing (UTS #46) specifications, encompassing problems like disallowed characters, incorrect bidirectional text rules (IDNABidiError), or invalid character contexts (InvalidCodepointContext).
fix
Validate the input domain name string against IDNA 2008 and UTS #46 rules, ensuring it adheres to character sets, contextual requirements, and structural rules. Use specific exception handling for subclasses like idna.InvalidCodepoint to identify and address the exact violation.
breaking idna v3.0 dropped Python 2 support entirely. Use 'idna<3' in requirements for Python 2 applications.
fix Pin to idna<3 for Python 2. For Python 3, use idna>=3.0.
breaking The string codec name is 'idna2008', NOT 'idna'. After 'import idna.codec', use str.encode('idna2008'). Using 'idna' as the codec name silently invokes the stdlib IDNA 2003 codec and produces wrong results for many domains.
fix import idna.codec; domain.encode('idna2008')
breaking Dot-prefixed domains (e.g. '.example.com') are no longer accepted as valid. They raise IDNAError. Strip leading dots before calling encode().
fix domain = domain.lstrip('.')
deprecated The 'transitional' keyword argument to encode() no longer has any effect because Unicode 16.0.0 removed transitional processing. It will be removed in a future release.
fix Remove the transitional=True/False argument from all encode() calls.
gotcha Uppercase and mixed-case labels raise InvalidCodepoint in strict (default) IDNA 2008 mode. Pass uts46=True to enable UTS #46 case-folding, which silently lowercases the input before conversion.
fix idna.encode(domain, uts46=True) for user-supplied domains; idna.encode(domain) only for already-normalized lowercase labels.
gotcha CVE-2024-3651 (fixed in v3.7): specially crafted inputs to encode() caused catastrophic ReDoS-style CPU consumption. Any deployment on untrusted input must be on >=3.7.
fix Upgrade to idna>=3.7 immediately if processing untrusted domain names.
gotcha Emoji and symbol domains are expressly prohibited by IDNA 2008 and will raise an exception. There is no flag to allow them. Fall back to encodings.idna (IDNA 2003) only as a last resort for legacy emoji domains.
fix Catch IDNAError and fall back to encodings.idna only when emoji/symbol domain support is explicitly required.
gotcha A NameError occurred because a variable was used before it was defined. This is a fundamental Python programming error, not specific to the 'idna' library.
fix Ensure all variables are assigned a value before they are referenced or used in operations (e.g., 'ace_label = idna.encode(domain_name)' instead of just 'print(ace_label)').
gotcha A NameError occurred due to an undefined variable 'ace_label'. The Python interpreter's suggestion of 'acе_label' indicates a possible typo or the use of visually similar Unicode characters (e.g., Cyrillic 'е' instead of Latin 'e') in the variable name, leading to the variable being unrecognized.
fix Ensure all variables are defined before they are used. Carefully check variable names for typos, especially when copying text or dealing with visually similar Unicode characters. It is recommended to use standard ASCII characters for variable names to avoid such confusion.
python os / libc status wheel install import disk
3.10 alpine (musl) - - 0.03s 18.5M
3.10 slim (glibc) - - 0.04s 19M
3.11 alpine (musl) - - 0.04s 20.2M
3.11 slim (glibc) - - 0.04s 21M
3.12 alpine (musl) - - 0.04s 12.1M
3.12 slim (glibc) - - 0.04s 13M
3.13 alpine (musl) - - 0.04s 11.8M
3.13 slim (glibc) - - 0.03s 12M
3.9 alpine (musl) - - 0.02s 17.8M
3.9 slim (glibc) - - 0.01s 18M

Encode a Unicode domain to ASCII-compatible encoding (A-label) and decode back; handle mixed-case input with uts46=True.

import idna

# Encode a Unicode domain to ACE/A-label bytes
acе_label = idna.encode('ドメイン.テスト')
print(ace_label)          # b'xn--eckwd4c7c.xn--zckzah'

# Decode ACE back to Unicode
unicode_domain = idna.decode('xn--eckwd4c7c.xn--zckzah')
print(unicode_domain)     # ドメイン.テスト

# Capital letters are rejected by IDNA 2008 strict mode;
# pass uts46=True to enable UTS #46 case-folding pre-processing.
ace_uts46 = idna.encode('Königsgäßchen', uts46=True)
print(ace_uts46)          # b'xn--knigsgchen-b4a3dun'

# Per-label helpers
print(idna.alabel('例え'))   # b'xn--r8jz45g'
print(idna.ulabel(b'xn--r8jz45g'))  # 例え

# Codec interface (import idna.codec to register it first)
import idna.codec
encoded = '例え.jp'.encode('idna2008')   # codec name is 'idna2008', NOT 'idna'
print(encoded)            # b'xn--r8jz45g.jp'

# Error handling
try:
    idna.encode('Königsgäßchen')   # strict mode rejects uppercase
except idna.core.InvalidCodepoint as e:
    print(f'Encoding error: {e}')