tldextract

5.3.1 · active · verified Sat Mar 28

tldextract accurately separates a URL's subdomain, domain, and public suffix, using the Public Suffix List (PSL). It handles edge cases often missed by naive parsing methods. By default, it supports public ICANN TLDs and their exceptions, with optional support for private domains. The current version is 5.3.1, and the library maintains an active development and release cadence.

Warnings

Install

Imports

Quickstart

Demonstrates basic URL component extraction using `tldextract.extract()` and accessing the results as attributes of the `ExtractResult` object.

import tldextract

# Basic extraction
extract_result = tldextract.extract('http://forums.news.cnn.com/')
print(f"Subdomain: {extract_result.subdomain}")
print(f"Domain: {extract_result.domain}")
print(f"Suffix: {extract_result.suffix}")
print(f"Full host: {extract_result.fqdn}")

# Example with private suffix
private_extract = tldextract.extract('waiterrant.blogspot.com')
print(f"\nPrivate domain example: {private_extract.subdomain}.{private_extract.domain}.{private_extract.suffix}")

view raw JSON →