dvc-gs: Google Cloud Storage Plugin for DVC
dvc-gs is the official Google Cloud Storage plugin for DVC (Data Version Control). It enables DVC to store and retrieve data artifacts from Google Cloud Storage buckets, allowing users to version large files and models in the cloud. The current version is 3.0.2, and it typically follows DVC's release cadence, with frequent updates to align with core DVC features and bug fixes.
Common errors
-
ERROR: Plugin 'gs' is not found. Check if the plugin is installed.
cause The dvc-gs plugin is not installed or DVC cannot find it. This usually happens if you install `dvc` but forget `dvc-gs` or `dvc[gs]`.fixInstall the plugin: `pip install dvc-gs` or `pip install dvc[gs]`. -
gcsfs.exceptions.NoCredentialsError: NoCredentialsError: Could not determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS, provide explicit credentials, or load them from gcloud config.
cause DVC-GS (via gcsfs) could not find any Google Cloud credentials in the environment.fixSet the `GOOGLE_APPLICATION_CREDENTIALS` environment variable to the path of your service account key file, or run `gcloud auth application-default login` to set up user credentials. -
dvc.api.DvcException: 'gs://your-gcs-bucket' is not a valid DVC remote or cannot be accessed.
cause This generic error often indicates issues with the remote configuration (e.g., typo in URL, bucket doesn't exist, or permissions issues even if credentials are found).fixDouble-check the GCS bucket URL for typos. Verify that the bucket exists and that the configured credentials have appropriate read/write access to it.
Warnings
- breaking DVC 3.0.0 (and consequently dvc-gs 3.0.0) introduced significant changes to the DVC API, particularly affecting programmatic usage with `dvc.api.get_url()`, `dvc.api.open()`, and `dvc.api.read()`. Code written for DVC 2.x will likely break with DVC 3.x.
- gotcha DVC-GS relies on Google Cloud credentials. Without proper authentication, operations will fail with permission errors. Common methods include setting `GOOGLE_APPLICATION_CREDENTIALS` or using `gcloud auth application-default login`.
- gotcha While `dvc-gs` version 3.0.2 introduced support for anonymous GCS login, previous versions or specific bucket configurations might still require explicit authentication even for public data, leading to unexpected `AccessDenied` errors.
Install
-
pip install dvc-gs -
pip install dvc[gs]
Imports
- dvc.api
import dvc.api # dvc-gs functionality is implicitly used by dvc.api calls # when a Google Cloud Storage remote is configured.
Quickstart
import os
from dvc.repo import Repo
# Initialize DVC in a new directory
os.makedirs('my_project', exist_ok=True)
os.chdir('my_project')
repo = Repo.init()
# Configure a Google Cloud Storage remote
# Replace 'your-gcs-bucket' with your actual bucket name
# Ensure GOOGLE_APPLICATION_CREDENTIALS points to a service account key or use gcloud auth
if not os.environ.get('GOOGLE_APPLICATION_CREDENTIALS'):
print("Warning: GOOGLE_APPLICATION_CREDENTIALS not set. Ensure gcloud is authenticated or anonymous access is allowed for the bucket.")
repo.remote.add(
name='my_gs_remote',
url=f'gs://{os.environ.get("GCS_BUCKET_NAME", "your-gcs-bucket")}'
)
# Create a dummy data file
with open('data.txt', 'w') as f:
f.write('hello dvc-gs')
# Add the file to DVC and push to GCS
repo.add('data.txt')
repo.push('data.txt')
print("data.txt added and pushed to GCS.")
# To verify, you can pull the data into another location or check your GCS bucket
# For example, in a new directory:
# os.chdir('..')
# os.makedirs('another_project', exist_ok=True)
# os.chdir('another_project')
# Repo.clone('path/to/my_project', 'another_project')
# new_repo = Repo('.')
# new_repo.pull('data.txt')