IDNA — Internationalized Domain Names in Applications
idna implements the Internationalized Domain Names in Applications (IDNA 2008, RFC 5891) protocol and Unicode IDNA Compatibility Processing (UTS #46) for Python. It is the modern replacement for the built-in encodings.idna module, which only supports the obsolete IDNA 2003 standard. Current version is 3.11 (Python 3.8+); releases are irregular but active, with security patches and Unicode data updates driving most releases.
Warnings
- breaking idna v3.0 dropped Python 2 support entirely. Use 'idna<3' in requirements for Python 2 applications.
- breaking The string codec name is 'idna2008', NOT 'idna'. After 'import idna.codec', use str.encode('idna2008'). Using 'idna' as the codec name silently invokes the stdlib IDNA 2003 codec and produces wrong results for many domains.
- breaking Dot-prefixed domains (e.g. '.example.com') are no longer accepted as valid. They raise IDNAError. Strip leading dots before calling encode().
- deprecated The 'transitional' keyword argument to encode() no longer has any effect because Unicode 16.0.0 removed transitional processing. It will be removed in a future release.
- gotcha Uppercase and mixed-case labels raise InvalidCodepoint in strict (default) IDNA 2008 mode. Pass uts46=True to enable UTS #46 case-folding, which silently lowercases the input before conversion.
- gotcha CVE-2024-3651 (fixed in v3.7): specially crafted inputs to encode() caused catastrophic ReDoS-style CPU consumption. Any deployment on untrusted input must be on >=3.7.
- gotcha Emoji and symbol domains are expressly prohibited by IDNA 2008 and will raise an exception. There is no flag to allow them. Fall back to encodings.idna (IDNA 2003) only as a last resort for legacy emoji domains.
Install
-
pip install idna
Imports
- encode
import idna; idna.encode('例え.jp') - IDNAError
from idna.core import InvalidCodepoint
- codec
import idna.codec; '例え.jp'.encode('idna2008') - compat
import idna.compat
Quickstart
import idna
# Encode a Unicode domain to ACE/A-label bytes
acе_label = idna.encode('ドメイン.テスト')
print(ace_label) # b'xn--eckwd4c7c.xn--zckzah'
# Decode ACE back to Unicode
unicode_domain = idna.decode('xn--eckwd4c7c.xn--zckzah')
print(unicode_domain) # ドメイン.テスト
# Capital letters are rejected by IDNA 2008 strict mode;
# pass uts46=True to enable UTS #46 case-folding pre-processing.
ace_uts46 = idna.encode('Königsgäßchen', uts46=True)
print(ace_uts46) # b'xn--knigsgchen-b4a3dun'
# Per-label helpers
print(idna.alabel('例え')) # b'xn--r8jz45g'
print(idna.ulabel(b'xn--r8jz45g')) # 例え
# Codec interface (import idna.codec to register it first)
import idna.codec
encoded = '例え.jp'.encode('idna2008') # codec name is 'idna2008', NOT 'idna'
print(encoded) # b'xn--r8jz45g.jp'
# Error handling
try:
idna.encode('Königsgäßchen') # strict mode rejects uppercase
except idna.core.InvalidCodepoint as e:
print(f'Encoding error: {e}')