Parsing and validation of URIs (RFC 3986) and IRIs (RFC 3987)
The `rfc3987` Python library provides regular expressions and utilities for parsing and validating Uniform Resource Identifiers (URIs) as per RFC 3986 and Internationalized Resource Identifiers (IRIs) according to RFC 3987. The current version is 1.3.8. Development on the original GitHub repository has ceased, and the project is now archived, with development reportedly moved to Codeberg, implying a largely unmaintained or very slow release cadence on PyPI.
Warnings
- breaking The original GitHub repository for `rfc3987` has been archived by its owner, and development has reportedly moved to a different platform (Codeberg). This indicates that the PyPI package might not receive future updates or active maintenance from the original author, potentially leading to stagnation or unaddressed issues for newer Python versions or RFC amendments.
- gotcha The `parse` and `match` functions have different capabilities depending on whether the optional `regex` package is installed. If `regex` is available, any RFC 3986/3987 rule can be used. If only Python's built-in `re` module is present, only specific, special-cased rules (like 'IRI_reference', 'IRI', 'absolute_IRI', 'URI_reference', 'URI', etc.) are supported.
- deprecated The library primarily lists Python 2.x and Python 3.2-3.6 for compatibility testing in its PyPI metadata. While it generally works with newer Python 3 versions, explicit support, testing, and potential edge-case fixes for Python 3.7+ are not officially documented due to the lack of recent updates and archived status of the original repository.
- gotcha For older Python versions (<=3.2), characters beyond the Basic Multilingual Plane (BMP) might not be fully supported on 'narrow' Python builds, which can lead to issues when processing internationalized characters in IRIs. This is a Python interpreter limitation, not specific to `rfc3987` itself.
Install
-
pip install rfc3987
Imports
- match
from rfc3987 import match
- parse
from rfc3987 import parse
- compose
from rfc3987 import compose
- resolve
from rfc3987 import resolve
Quickstart
from rfc3987 import parse, match, resolve
# Parse an IRI and get its components
iri_string = 'http://example.com/path?query=value#fragment'
parsed_iri = parse(iri_string, rule='IRI')
print(f"Parsed Scheme: {parsed_iri.get('scheme')}")
print(f"Parsed Authority: {parsed_iri.get('authority')}")
print(f"Parsed Path: {parsed_iri.get('path')}")
# Check if a string matches a specific rule
is_pct_encoded = match('%C7', 'pct_encoded')
print(f"Is '%C7' percent-encoded? {bool(is_pct_encoded)}")
# Resolve a relative URI reference
base_uri = 'http://a/b/c/d;p?q'
relative_ref = '../../g'
resolved_uri = resolve(base_uri, relative_ref)
print(f"Resolved URI: {resolved_uri}")