{"id":3984,"library":"dvc-s3","title":"DVC S3 Remote Plugin","description":"dvc-s3 is a plugin for Data Version Control (DVC) that enables storing and retrieving data, models, and pipelines from Amazon S3. It integrates seamlessly with DVC's CLI and API to manage datasets on S3. The current version is 3.3.0, and it follows a minor release cadence driven by DVC's core development.","status":"active","version":"3.3.0","language":"en","source_language":"en","source_url":"https://github.com/iterative/dvc-s3","tags":["dvc","s3","cloud storage","data version control","machine learning","mlops"],"install":[{"cmd":"pip install dvc dvc-s3","lang":"bash","label":"Install DVC with S3 support"}],"dependencies":[{"reason":"This is a plugin for DVC; DVC must be installed separately to use dvc-s3 functionality.","package":"dvc","optional":false},{"reason":"Provides the underlying filesystem interface for S3 storage. Minimum version >=2024.12.0 required since v3.2.1.","package":"s3fs","optional":false},{"reason":"Was a transitive dependency through s3fs but explicitly dropped in v3.3.0. May still be needed if your project implicitly relied on it or requires it for other S3 operations.","package":"boto3","optional":true}],"imports":[],"quickstart":{"code":"import os\nimport subprocess\nimport shutil\n\n# Ensure dvc and dvc-s3 are installed: `pip install dvc dvc-s3`\n\n# --- Configuration for S3 (replace with your actual details) ---\n# For this example to work with a real S3 bucket, you need valid AWS credentials\n# and an S3 bucket. It's recommended to set them as environment variables:\n# export AWS_ACCESS_KEY_ID='AKIA...'\n# export AWS_SECRET_ACCESS_KEY='YOUR_SECRET_KEY'\n# export DVC_S3_BUCKET='your-dvc-test-bucket-name'\n# export AWS_DEFAULT_REGION='us-east-1'\n\naws_access_key_id = os.environ.get('AWS_ACCESS_KEY_ID', 'YOUR_AWS_ACCESS_KEY_ID_PLACEHOLDER')\naws_secret_access_key = os.environ.get('AWS_SECRET_ACCESS_KEY', 'YOUR_AWS_SECRET_ACCESS_KEY_PLACEHOLDER')\ns3_bucket_name = os.environ.get('DVC_S3_BUCKET', 'your-dvc-test-bucket-name')\ns3_region = os.environ.get('AWS_DEFAULT_REGION', 'us-east-1')\n\nif 'YOUR_AWS_ACCESS_KEY_ID_PLACEHOLDER' in aws_access_key_id or 'your-dvc-test-bucket-name' in s3_bucket_name:\n    print(\"\\nWARNING: Please set AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, DVC_S3_BUCKET, and AWS_DEFAULT_REGION environment variables for the quickstart to interact with real S3.\")\n    print(\"Proceeding with placeholder values, push will likely fail.\")\n\nproject_dir = \"dvc-s3-quickstart\"\n\n# Clean up previous runs if any (optional, uncomment if needed for repeated runs)\n# if os.path.exists(project_dir):\n#     shutil.rmtree(project_dir)\n\nos.makedirs(project_dir, exist_ok=True)\nos.chdir(project_dir)\n\ntry:\n    # 1. Initialize DVC repository (without Git for simplicity)\n    print(\"\\n1. Initializing DVC repository...\")\n    subprocess.run([\"dvc\", \"init\", \"--no-scm\"], check=True)\n\n    # 2. Create a dummy data directory and file\n    os.makedirs(\"data\", exist_ok=True)\n    with open(\"data/my_data.txt\", \"w\") as f:\n        f.write(\"Hello, DVC and S3!\")\n\n    # 3. Add S3 remote\n    print(f\"\\n3. Adding S3 remote 'my_s3_remote' to s3://{s3_bucket_name}/dvc-store\")\n    subprocess.run([\"dvc\", \"remote\", \"add\", \"-d\", \"my_s3_remote\", f\"s3://{s3_bucket_name}/dvc-store\"], check=True)\n    subprocess.run([\"dvc\", \"remote\", \"modify\", \"my_s3_remote\", \"region\", s3_region], check=True)\n\n    # 4. Add data to DVC\n    print(\"\\n4. Adding 'data/my_data.txt' to DVC...\")\n    subprocess.run([\"dvc\", \"add\", \"data/my_data.txt\"], check=True)\n\n    # 5. Push data to the S3 remote\n    print(\"\\n5. Pushing data to S3 remote...\")\n    subprocess.run([\"dvc\", \"push\"], check=True)\n    print(\"\\nQuickstart completed! Check your S3 bucket for the DVC store.\")\n\nexcept subprocess.CalledProcessError as e:\n    print(f\"\\nERROR: DVC command failed.\\nCommand: {' '.join(e.cmd)}\\nOutput:\\n{e.stdout.decode()}\\n{e.stderr.decode()}\")\n    print(\"Please ensure DVC and dvc-s3 are installed, AWS credentials are set, and the S3 bucket exists and is writable.\")\nexcept FileNotFoundError:\n    print(\"ERROR: 'dvc' command not found. Please ensure DVC is installed and in your PATH.\")\nfinally:\n    os.chdir(\"..\") # Return to original directory\n    # Optional: Clean up the created project directory\n    # shutil.rmtree(project_dir)\n","lang":"python","description":"This quickstart demonstrates how to initialize a DVC project, configure an S3 remote, add a data file to DVC, and push it to your S3 bucket. Ensure you have `dvc` and `dvc-s3` installed, and your AWS credentials (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) and desired S3 bucket name (DVC_S3_BUCKET) are set as environment variables."},"warnings":[{"fix":"If you need `boto3` for other reasons in your environment, install it separately: `pip install boto3`.","message":"The `boto3` library is no longer a direct dependency as of version 3.3.0. If your project implicitly relied on `boto3` being installed alongside `dvc-s3` for other S3-related operations, you will now need to install it explicitly (e.g., `pip install boto3`).","severity":"breaking","affected_versions":">=3.3.0"},{"fix":"Always install `dvc-s3` alongside `dvc` using `pip install dvc dvc-s3` to ensure the S3 remote is available.","message":"The `dvc-s3` plugin must be installed separately from `dvc`. Installing just `dvc` will not provide S3 remote support. Always use `pip install dvc dvc-s3` to ensure S3 functionality.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure AWS credentials are correctly configured via environment variables, AWS shared credentials file, or IAM roles. Verify the S3 bucket exists and DVC has appropriate permissions (read/write access).","message":"AWS credentials (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) and S3 bucket configuration are crucial. Incorrectly set credentials or a non-existent/inaccessible bucket are common causes of errors when attempting to access S3 remotes.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Upgrade `s3fs` by running `pip install --upgrade s3fs`.","message":"Version 3.2.1 and later require `s3fs>=2024.12.0`. If you are running an older version of `s3fs`, you must upgrade to ensure compatibility and stability when using `dvc-s3`.","severity":"breaking","affected_versions":">=3.2.1"},{"fix":"After adding your remote, use `dvc remote modify <remote_name> region <your_region>` (e.g., `dvc remote modify my_s3_remote region us-east-1`).","message":"Explicitly setting the S3 region for your remote is often good practice to avoid ambiguity or issues with AWS endpoint resolution, especially when working across different regions or with specific AWS configurations.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}