dvc-gs: Google Cloud Storage Plugin for DVC

3.0.2 · active · verified Thu Apr 16

dvc-gs is the official Google Cloud Storage plugin for DVC (Data Version Control). It enables DVC to store and retrieve data artifacts from Google Cloud Storage buckets, allowing users to version large files and models in the cloud. The current version is 3.0.2, and it typically follows DVC's release cadence, with frequent updates to align with core DVC features and bug fixes.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize a DVC repository, configure a Google Cloud Storage remote using `dvc-gs`, add a data file, and push it to the GCS bucket. It assumes DVC is installed and you have appropriate GCS credentials configured (e.g., `GOOGLE_APPLICATION_CREDENTIALS` environment variable or `gcloud` authenticated). Replace `your-gcs-bucket` with your actual bucket name.

import os
from dvc.repo import Repo

# Initialize DVC in a new directory
os.makedirs('my_project', exist_ok=True)
os.chdir('my_project')
repo = Repo.init()

# Configure a Google Cloud Storage remote
# Replace 'your-gcs-bucket' with your actual bucket name
# Ensure GOOGLE_APPLICATION_CREDENTIALS points to a service account key or use gcloud auth
if not os.environ.get('GOOGLE_APPLICATION_CREDENTIALS'):
    print("Warning: GOOGLE_APPLICATION_CREDENTIALS not set. Ensure gcloud is authenticated or anonymous access is allowed for the bucket.")

repo.remote.add(
    name='my_gs_remote',
    url=f'gs://{os.environ.get("GCS_BUCKET_NAME", "your-gcs-bucket")}'
)

# Create a dummy data file
with open('data.txt', 'w') as f:
    f.write('hello dvc-gs')

# Add the file to DVC and push to GCS
repo.add('data.txt')
repo.push('data.txt')
print("data.txt added and pushed to GCS.")

# To verify, you can pull the data into another location or check your GCS bucket
# For example, in a new directory:
# os.chdir('..')
# os.makedirs('another_project', exist_ok=True)
# os.chdir('another_project')
# Repo.clone('path/to/my_project', 'another_project')
# new_repo = Repo('.')
# new_repo.pull('data.txt')

view raw JSON →