URL Normalize
URL Normalize is a Python library (current version 2.2.1) designed for standardizing and normalizing URLs, including support for internationalized domain names (IDN). It handles various aspects like scheme and host lowercasing, percent-encoding, path segment collapsing, and configurable default schemes and query parameter filtering. It is actively maintained with regular releases.
Warnings
- breaking The default scheme for URLs without one has changed from 'http' to 'https'.
- breaking The minimum required Python version has been updated to 3.8.
- breaking IDNA handling has been migrated to use IDNA 2008 with UTS46 processing, requiring the `idna` package.
- breaking The `sort_query_params` option has been removed as it was deemed incorrect in its previous implementation.
Install
-
pip install url-normalize
Imports
- url_normalize
from url_normalize import url_normalize
Quickstart
from url_normalize import url_normalize
# Basic normalization
normalized_url = url_normalize("HTTP://www.Example.com:80/a/../b?c=2&c=1")
print(f"Normalized URL: {normalized_url}")
# Expected: https://www.example.com/b?c=1&c=2
# Normalizing a path with a default domain and scheme
normalized_path = url_normalize("/foo/bar", default_domain="example.com", default_scheme="http")
print(f"Normalized path with domain: {normalized_path}")
# Expected: http://example.com/foo/bar