rfc3986
rfc3986 is a Python implementation of RFC 3986 including validation and authority parsing. This module also supports RFC 6874, which adds support for zone identifiers to IPv6 Addresses. It provides APIs for parsing, validating, and building URIs, with convenience methods for `urllib.parse` compatibility. The current version is 2.0.0, released in January 2022, and the project appears to be actively maintained.
Warnings
- gotcha rfc3986 strictly adheres to RFC 3986, which can result in different parsing behavior for malformed or non-standard URIs compared to `urllib.parse`. Specifically, an authority component must be preceded by `//`, otherwise, `rfc3986` may interpret it as part of the path.
- gotcha The library does not support Internationalized Resource Identifiers (IRIs) as defined in RFC 3987. It focuses solely on RFC 3986.
- gotcha The `uri_reference.is_valid()` method might, in some edge cases, accept invalid hostnames or out-of-range port numbers (according to open GitHub issues).
- gotcha The `copy_with` method on parsed URI objects (from `uri_reference` or `urlparse`) replaces existing components rather than extending them. For example, adding a path segment replaces the entire path.
Install
-
pip install rfc3986
Imports
- uri_reference
from rfc3986 import uri_reference
- urlparse
from rfc3986 import urlparse
- validators
from rfc3986 import validators
Quickstart
from rfc3986 import uri_reference, validators
# Parsing a URI Reference
uri_str = 'https://user:pass@example.com:8080/path/to/resource?key=value#fragment'
uri = uri_reference(uri_str)
print(f"Scheme: {uri.scheme}") # Output: https
print(f"Host: {uri.host}") # Output: example.com
print(f"Path: {uri.path}") # Output: /path/to/resource
print(f"Query: {uri.query}") # Output: key=value
# Validating a URI
validator = validators.Validator().allow_schemes(['https']).allow_hosts(['example.com'])
if validator.validate(uri):
print("URI is valid according to custom rules.")
else:
print("URI is NOT valid according to custom rules.")
# Building a URI
from rfc3986 import URIBuilder
builder = (URIBuilder()
.add_scheme('mailto')
.add_path('user@domain.com'))
mailto_uri = builder.finalize()
print(f"Built URI: {mailto_uri.unsplit()}") # Output: mailto:user@domain.com