tldextract
tldextract accurately separates a URL's subdomain, domain, and public suffix, using the Public Suffix List (PSL). It handles edge cases often missed by naive parsing methods. By default, it supports public ICANN TLDs and their exceptions, with optional support for private domains. The current version is 5.3.1, and the library maintains an active development and release cadence.
Warnings
- breaking The `ExtractResult` object changed from a `namedtuple` to a `dataclass` in v5.0.0. This means direct indexing, slicing, or unpacking the result object will raise a `TypeError`.
- breaking The `ExtractResult` object gained a fourth field, `is_private: bool`, in v4.0.0. Code that unpacks the result expecting only 3 fields will break.
- deprecated The `registered_domain` property on `ExtractResult` was deprecated in v5.3.0. It will be removed in a future major version.
- breaking Support for Python 3.9 was dropped in v5.3.1, and Python 3.8 was dropped in v5.1.3. The library now requires Python 3.10 or newer.
- gotcha On its first run, `tldextract` fetches the latest Public Suffix List via an HTTP request and caches it indefinitely in `$HOME/.cache/python-tldextract`. This can cause initial delays or network dependencies in environments where this behavior is not expected.
- gotcha `tldextract` is lenient and performs minimal URL validation. It will attempt to extract components from any string, including partial or malformed URLs, prioritizing ease of use over strict validation.
Install
-
pip install tldextract
Imports
- tldextract
import tldextract
- ExtractResult
from tldextract import ExtractResult
- update
from tldextract import update
Quickstart
import tldextract
# Basic extraction
extract_result = tldextract.extract('http://forums.news.cnn.com/')
print(f"Subdomain: {extract_result.subdomain}")
print(f"Domain: {extract_result.domain}")
print(f"Suffix: {extract_result.suffix}")
print(f"Full host: {extract_result.fqdn}")
# Example with private suffix
private_extract = tldextract.extract('waiterrant.blogspot.com')
print(f"\nPrivate domain example: {private_extract.subdomain}.{private_extract.domain}.{private_extract.suffix}")