Parse and Build Amazon S3 URLs
s3urls is a small, dependency-free Python library designed to parse and build Amazon S3 URLs. It provides utility functions to extract bucket names and keys from S3 URIs (e.g., `s3://bucket/key`) and HTTP/HTTPS S3 object URLs, as well as to construct such URLs. The current version is 0.0.3, with its last release in 2018, indicating a stable but infrequently updated project.
Common errors
-
ValueError: Invalid S3 URL
cause The input string passed to `parse_s3_url` or `build_s3_url` does not conform to a recognized S3 URI format (e.g., `s3://bucket/key`) or HTTP/HTTPS S3 object URL (e.g., `https://bucket.s3.amazonaws.com/key`). This can happen with malformed URLs or unsupported S3 endpoint variations.fixEnsure the URL strictly follows one of the supported S3 URL formats. Verify that bucket names, keys, regions, schemes, and styles are correctly structured. For specific regional endpoints, manually construct the URL string if `build_s3_url`'s parameters don't cover the exact format. -
Failed to resolve S3 endpoint or access S3 bucket with a specific regional URL format.
cause While `s3urls` can infer regions from many standard S3 HTTP/HTTPS URLs, it might not explicitly handle all legacy or highly specific regional S3 endpoint formats (e.g., `s3-<region>.amazonaws.com` vs `s3.<region>.amazonaws.com` for older regions) or specialized S3-compatible services. AWS has also had issues with `s3-<region>` vs `s3.<region>` in the past.fixWhen `s3urls`'s automatic parsing or building of regional URLs is problematic, explicitly specify the `region` parameter for `build_s3_url`. For complex cases or non-AWS S3-compatible services, consider using `urllib.parse` for basic component extraction or a dedicated AWS SDK (like `boto3`) for more robust, region-aware URL generation and interaction.
Warnings
- gotcha The `s3urls` library performs rudimentary URL validation. It does NOT validate bucket names or object keys against the comprehensive rules defined in the AWS documentation. Users should ensure their bucket and key inputs adhere to AWS naming conventions to avoid issues with AWS services.
- deprecated AWS is deprecating path-style S3 URLs (e.g., `https://s3.amazonaws.com/<bucket>/<key>`) in favor of virtual-hosted-style URLs (e.g., `https://<bucket>.s3.amazonaws.com/<key>`) in new regions and for new buckets. While `s3urls` supports parsing path-style URLs, it's recommended to build and primarily use virtual-hosted-style URLs for future compatibility and best practice.
- gotcha Be aware of similarly named but unrelated packages on PyPI, such as `s3url` (singular) or `s3-url-helper`. Ensure you are installing and importing from `s3urls` (plural) to use this specific library.
Install
-
pip install s3urls
Imports
- parse_s3_url
from s3urls import parse_s3_url
- build_s3_url
from s3urls import build_s3_url
Quickstart
from s3urls import parse_s3_url, build_s3_url
# Example 1: Parsing an S3 URI
s3_uri = 's3://my-bucket/path/to/object.txt'
parsed = parse_s3_url(s3_uri)
print(f"Parsed S3 URI - Bucket: {parsed['bucket']}, Key: {parsed['key']}")
# Example 2: Parsing an HTTPS virtual-hosted style URL
https_url = 'https://my-bucket.s3.us-east-1.amazonaws.com/another/file.csv'
parsed = parse_s3_url(https_url)
print(f"Parsed HTTPS URL - Bucket: {parsed['bucket']}, Key: {parsed['key']}, Region: {parsed['region']}")
# Example 3: Building an S3 URI
built_s3_uri = build_s3_url(bucket='my-new-bucket', key='data/results.json')
print(f"Built S3 URI: {built_s3_uri}")
# Example 4: Building an HTTPS virtual-hosted style URL with region
built_https_url = build_s3_url(bucket='my-other-bucket', key='images/pic.jpg', region='eu-west-1', scheme='https', style='virtual-host')
print(f"Built HTTPS URL: {built_https_url}")