{"id":5717,"library":"s5cmd","title":"s5cmd Python Distributions","description":"This project provides Python wheels for the high-performance `s5cmd` command-line tool, a utility written in Go for managing S3 and S3-compatible object storage systems. It focuses on speed and efficiency for bulk operations, parallel processing, and advanced filtering. The Python package ensures the `s5cmd` executable is available in the user's PATH after installation, allowing Python applications to invoke the CLI tool via subprocess. The current version is 0.3.3, with a release cadence tied to updates of the underlying `s5cmd` binary and build infrastructure improvements.","status":"active","version":"0.3.3","language":"en","source_language":"en","source_url":"https://github.com/jcfr/s5cmd-python-distributions","tags":["aws","s3","object storage","cli wrapper","performance","go"],"install":[{"cmd":"pip install s5cmd","lang":"bash","label":"Install `s5cmd` distribution package"}],"dependencies":[],"imports":[{"note":"The `s5cmd` package primarily installs the `s5cmd` command-line executable. Interaction from Python is typically done by invoking this executable using the `subprocess` module.","symbol":"subprocess","correct":"import subprocess"}],"quickstart":{"code":"import subprocess\nimport os\n\n# Ensure AWS credentials are configured (e.g., via environment variables or ~/.aws/credentials)\n# For example, using environment variables for demonstration:\n# os.environ['AWS_ACCESS_KEY_ID'] = os.environ.get('AWS_ACCESS_KEY_ID', 'YOUR_AWS_ACCESS_KEY_ID')\n# os.environ['AWS_SECRET_ACCESS_KEY'] = os.environ.get('AWS_SECRET_ACCESS_KEY', 'YOUR_AWS_SECRET_ACCESS_KEY')\n# os.environ['AWS_REGION'] = os.environ.get('AWS_REGION', 'us-east-1')\n\ntry:\n    # Verify s5cmd is installed and accessible in PATH\n    version_output = subprocess.run(['s5cmd', 'version'], capture_output=True, text=True, check=True)\n    print(f\"s5cmd version:\\n{version_output.stdout}\")\n\n    # Example: List objects in an S3 bucket (replace with a real bucket)\n    bucket_name = \"your-test-s5cmd-bucket\"\n    list_command = ['s5cmd', 'ls', f's3://{bucket_name}/']\n    list_result = subprocess.run(list_command, capture_output=True, text=True, check=True)\n    print(f\"\\nListing objects in s3://{bucket_name}/:\\n{list_result.stdout}\")\n\n    # Example: Create a dummy local file and upload it\n    local_file_name = \"hello_s5cmd.txt\"\n    with open(local_file_name, \"w\") as f:\n        f.write(\"Hello from s5cmd Python distribution!\")\n    \n    upload_command = ['s5cmd', 'cp', local_file_name, f's3://{bucket_name}/{local_file_name}']\n    upload_result = subprocess.run(upload_command, capture_output=True, text=True, check=True)\n    print(f\"\\nUploaded {local_file_name}:\\n{upload_result.stdout}\")\n\n    # Example: Download the file back\n    download_command = ['s5cmd', 'cp', f's3://{bucket_name}/{local_file_name}', f'./downloaded_{local_file_name}']\n    download_result = subprocess.run(download_command, capture_output=True, text=True, check=True)\n    print(f\"\\nDownloaded 'downloaded_{local_file_name}':\\n{download_result.stdout}\")\n    \n    # Clean up local files\n    os.remove(local_file_name)\n    os.remove(f'./downloaded_{local_file_name}')\n\n    # Note: For production, consider robust error handling and command construction.\n    # Ensure the bucket 'your-test-s5cmd-bucket' exists and credentials have write access.\n\nexcept FileNotFoundError:\n    print(\"Error: 's5cmd' command not found. Ensure it's installed and in your system's PATH.\")\nexcept subprocess.CalledProcessError as e:\n    print(f\"Error executing s5cmd command: {e}\")\n    print(f\"Stdout: {e.stdout}\")\n    print(f\"Stderr: {e.stderr}\")\nexcept Exception as e:\n    print(f\"An unexpected error occurred: {e}\")\n","lang":"python","description":"This quickstart demonstrates how to check for `s5cmd` and execute basic S3 operations (list, upload, download) using Python's `subprocess` module. It assumes `s5cmd` is correctly installed via `pip install s5cmd` and that AWS credentials are configured in the environment or standard locations for `s5cmd` to access S3."},"warnings":[{"fix":"If you need a direct Python API, install `s5cmdpy` (`pip install s5cmdpy`) instead. If you intend to call the command-line tool, use `subprocess` as shown in the quickstart.","message":"This `s5cmd` package (from `ImagingDataCommons/s5cmd-python-distributions`) primarily distributes the `s5cmd` Go binary and makes it executable. It does NOT provide a direct Python API for S3 interactions. For a Pythonic wrapper that offers `S5CmdRunner` class and direct function calls, consider `s5cmdpy` (from `trojblue/s5cmd-python`).","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure your environment or `~/.aws/credentials` file is correctly set up with valid AWS credentials and region information before running `s5cmd` commands.","message":"`s5cmd` (the underlying Go tool) relies on standard AWS credential configuration (e.g., `~/.aws/credentials`, environment variables like `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_REGION`) to authenticate with S3. Without proper configuration, commands will fail.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Review the `s5cmd` official documentation for recommended `--numworkers`, `-uw` (upload workers), and `-dw` (download workers) flags and adjust them based on your network, S3 bucket, and file characteristics to maximize throughput. Example: `s5cmd -uw 32 -dw 16 cp ...`","message":"Achieving the advertised high performance of `s5cmd` often requires tuning concurrency parameters (e.g., `--numworkers`, `-uw`, `-dw`) which default to conservative values. Suboptimal settings can lead to significantly slower transfers.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-13T00:00:00.000Z","next_check":"2026-07-12T00:00:00.000Z"}