s5cmd Python Distributions
This project provides Python wheels for the high-performance `s5cmd` command-line tool, a utility written in Go for managing S3 and S3-compatible object storage systems. It focuses on speed and efficiency for bulk operations, parallel processing, and advanced filtering. The Python package ensures the `s5cmd` executable is available in the user's PATH after installation, allowing Python applications to invoke the CLI tool via subprocess. The current version is 0.3.3, with a release cadence tied to updates of the underlying `s5cmd` binary and build infrastructure improvements.
Warnings
- gotcha This `s5cmd` package (from `ImagingDataCommons/s5cmd-python-distributions`) primarily distributes the `s5cmd` Go binary and makes it executable. It does NOT provide a direct Python API for S3 interactions. For a Pythonic wrapper that offers `S5CmdRunner` class and direct function calls, consider `s5cmdpy` (from `trojblue/s5cmd-python`).
- gotcha `s5cmd` (the underlying Go tool) relies on standard AWS credential configuration (e.g., `~/.aws/credentials`, environment variables like `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_REGION`) to authenticate with S3. Without proper configuration, commands will fail.
- gotcha Achieving the advertised high performance of `s5cmd` often requires tuning concurrency parameters (e.g., `--numworkers`, `-uw`, `-dw`) which default to conservative values. Suboptimal settings can lead to significantly slower transfers.
Install
-
pip install s5cmd
Imports
- subprocess
import subprocess
Quickstart
import subprocess
import os
# Ensure AWS credentials are configured (e.g., via environment variables or ~/.aws/credentials)
# For example, using environment variables for demonstration:
# os.environ['AWS_ACCESS_KEY_ID'] = os.environ.get('AWS_ACCESS_KEY_ID', 'YOUR_AWS_ACCESS_KEY_ID')
# os.environ['AWS_SECRET_ACCESS_KEY'] = os.environ.get('AWS_SECRET_ACCESS_KEY', 'YOUR_AWS_SECRET_ACCESS_KEY')
# os.environ['AWS_REGION'] = os.environ.get('AWS_REGION', 'us-east-1')
try:
# Verify s5cmd is installed and accessible in PATH
version_output = subprocess.run(['s5cmd', 'version'], capture_output=True, text=True, check=True)
print(f"s5cmd version:\n{version_output.stdout}")
# Example: List objects in an S3 bucket (replace with a real bucket)
bucket_name = "your-test-s5cmd-bucket"
list_command = ['s5cmd', 'ls', f's3://{bucket_name}/']
list_result = subprocess.run(list_command, capture_output=True, text=True, check=True)
print(f"\nListing objects in s3://{bucket_name}/:\n{list_result.stdout}")
# Example: Create a dummy local file and upload it
local_file_name = "hello_s5cmd.txt"
with open(local_file_name, "w") as f:
f.write("Hello from s5cmd Python distribution!")
upload_command = ['s5cmd', 'cp', local_file_name, f's3://{bucket_name}/{local_file_name}']
upload_result = subprocess.run(upload_command, capture_output=True, text=True, check=True)
print(f"\nUploaded {local_file_name}:\n{upload_result.stdout}")
# Example: Download the file back
download_command = ['s5cmd', 'cp', f's3://{bucket_name}/{local_file_name}', f'./downloaded_{local_file_name}']
download_result = subprocess.run(download_command, capture_output=True, text=True, check=True)
print(f"\nDownloaded 'downloaded_{local_file_name}':\n{download_result.stdout}")
# Clean up local files
os.remove(local_file_name)
os.remove(f'./downloaded_{local_file_name}')
# Note: For production, consider robust error handling and command construction.
# Ensure the bucket 'your-test-s5cmd-bucket' exists and credentials have write access.
except FileNotFoundError:
print("Error: 's5cmd' command not found. Ensure it's installed and in your system's PATH.")
except subprocess.CalledProcessError as e:
print(f"Error executing s5cmd command: {e}")
print(f"Stdout: {e.stdout}")
print(f"Stderr: {e.stderr}")
except Exception as e:
print(f"An unexpected error occurred: {e}")