can-ada: Fast Spec-Compliant URL Parser
can-ada is a Python wrapper for the Ada C++ library, providing a fast and spec-compliant URL parser. It adheres to the WHATWG URL Standard, offering high-performance parsing capabilities. The current version is 3.0.0, with releases typically following updates to the underlying Ada C++ library.
Common errors
-
ImportError: cannot import name 'URL' from 'can_ada'
cause Attempting to import `URL` (or other symbols) from `can_ada` directly as an attribute, or a typo in the import path.fixEnsure you are using `from can_ada import URL` (or `parse_url`, `URLSearchParams`) as these are top-level exports. -
ValueError: Invalid URL
cause The input string provided to the `URL` constructor or `parse_url` function does not conform to the WHATWG URL standard.fixVerify that the URL string is correctly formatted (e.g., includes a scheme like 'http://', valid characters, etc.). -
AttributeError: 'URL' object has no attribute 'domain'
cause Attempting to access a non-existent URL component. `can-ada` adheres to the WHATWG URL Standard, which uses `hostname` or `host` instead of `domain`.fixUse `url_object.hostname` for the domain without port, or `url_object.host` for the domain with port (if present). -
TypeError: argument 'base_input': 'NoneType' object cannot be converted to 'str'
cause Passing `None` as the `base_input` argument to the `URL` constructor when it expects a string for URL resolution.fixEnsure `base_input` is always a string when provided, or omit it if no base URL is required.
Warnings
- breaking Official support for Python 3.7 and 3.8 was removed in `can-ada` v2.0.0. While it might still work, stability and future compatibility are not guaranteed.
- gotcha `can-ada` v3.0.0 switched its underlying binding library from `pybind11` to `nanobind`. While this led to significant performance improvements (32% faster) and reduced call overhead, users with highly specialized C-extension interactions or those relying on `pybind11`-specific internal behaviors might observe subtle changes. Standard Python usage should be unaffected.
- gotcha The `URLSearchParams` object (introduced in v1.3.0) provides a structured way to interact with URL query parameters. Directly manipulating the `URL.search` string property can lead to malformed URLs if not handled carefully.
- gotcha When constructing URLs, ensure that relative paths or base URLs are handled correctly. The `URL` constructor allows an optional `base_input` argument for resolving relative URLs, which can prevent unexpected parsing outcomes.
Install
-
pip install can-ada
Imports
- URL
import can_ada.URL
from can_ada import URL
- parse_url
import can_ada.parse_url
from can_ada import parse_url
- URLSearchParams
import can_ada.URLSearchParams
from can_ada import URLSearchParams
Quickstart
from can_ada import URL, parse_url
# Parse a URL string into a URL object
url_string = "https://example.com:8080/path/to/resource?query=value&foo=bar#fragment"
url_object = URL(url_string)
print(f"Original URL: {url_object.href}")
print(f"Protocol: {url_object.protocol}") # e.g., 'https:'
print(f"Host: {url_object.host}") # e.g., 'example.com:8080'
print(f"Hostname: {url_object.hostname}") # e.g., 'example.com'
print(f"Port: {url_object.port}") # e.g., '8080'
print(f"Pathname: {url_object.pathname}") # e.g., '/path/to/resource'
print(f"Search: {url_object.search}") # e.g., '?query=value&foo=bar'
print(f"Hash: {url_object.hash}") # e.g., '#fragment'
# Modify parts of the URL
url_object.hostname = "newhost.org"
url_object.pathname = "/new/path"
print(f"Modified URL: {url_object.href}")
# Using parse_url for convenience (returns a dictionary-like object)
parsed_dict = parse_url("http://test.com/page")
print(f"Parsed dict protocol: {parsed_dict.protocol}")