Google Cloud Storage Transfer API Client Library
The `google-cloud-storage-transfer` library is the official Python client for the Google Cloud Storage Transfer Service. It enables programmatic control over data transfers to and from Google Cloud Storage, supporting various sources including other cloud providers and on-premises systems. Currently at version 1.20.0, this library adheres to Google Cloud's frequent release cadence, often receiving updates alongside other Python client libraries in the `googleapis/google-cloud-python` monorepo.
Warnings
- gotcha The Storage Transfer Service API must be explicitly enabled in your Google Cloud Project before use. Additionally, ensure proper authentication is configured, ideally using Application Default Credentials (ADC).
- gotcha The service has specific quotas and limits, including rate limits (e.g., 600 requests/min/project) and a 5 TiB maximum object size for transfers to Cloud Storage. Exceeding these limits can lead to failures or throttling.
- gotcha Insufficient IAM permissions are a common cause of transfer failures. The service account or user initiating the transfer needs permissions to create and manage transfer jobs, and read/write access to the source and destination resources.
- breaking If you are migrating from or encounter older code using the `google-api-services-storagetransfer` library, be aware that it's a legacy Google API Client Library. The `google-cloud-storage-transfer` is the recommended Cloud Client Library.
- gotcha When troubleshooting failed transfer jobs, it's crucial to enable and inspect logs to understand the root cause, especially for agent-based transfers (on-premises to cloud).
Install
-
pip install google-cloud-storage-transfer
Imports
- StorageTransferServiceClient
from google.cloud.storage_transfer import StorageTransferServiceClient
Quickstart
import os
from google.cloud.storage_transfer import StorageTransferServiceClient
def create_and_run_gcs_to_gcs_transfer_job(
project_id: str,
source_bucket_name: str,
sink_bucket_name: str,
job_description: str,
):
"""Creates and runs a one-time transfer job between two GCS buckets."""
client = StorageTransferServiceClient()
# Transfer job configuration
transfer_job = {
"project_id": project_id,
"description": job_description,
"transfer_spec": {
"gcs_data_source": {"bucket_name": source_bucket_name},
"gcs_data_sink": {"bucket_name": sink_bucket_name},
},
"status": "ENABLED", # Job is created in an enabled state
}
# Create the transfer job
# The API might create the job as DISABLED and then ENABLE it, or directly ENABLED.
# For a 'run now' quickstart, we often set it to ENABLED at creation.
try:
created_job = client.create_transfer_job(transfer_job=transfer_job)
print(f"Created transfer job: {created_job.name}")
# If the job is not already IN_PROGRESS (e.g., if set to ENABLED but not yet started)
# we can explicitly run it. For a one-time job set to ENABLED, it should start automatically.
# However, for demonstration, an explicit run call can be shown for clarity if desired
# for existing jobs, but 'create' with ENABLED often triggers it immediately for one-time.
print(f"Transfer job '{created_job.name}' initiated.")
# For more complex job management (e.g., recurrent jobs), you'd interact more with job status and run_transfer_job
except Exception as e:
print(f"Error creating or running transfer job: {e}")
# Example usage (replace with your actual project and bucket names)
if __name__ == "__main__":
# Ensure these environment variables are set or replace with actual values
# For local development, `gcloud auth application-default login` often provides credentials.
PROJECT_ID = os.environ.get("GCP_PROJECT_ID", "your-gcp-project-id")
SOURCE_BUCKET = os.environ.get("GCP_SOURCE_BUCKET", "your-source-gcs-bucket")
SINK_BUCKET = os.environ.get("GCP_SINK_BUCKET", "your-sink-gcs-bucket")
JOB_DESCRIPTION = "My Python quickstart GCS to GCS transfer"
if PROJECT_ID == "your-gcp-project-id" or SOURCE_BUCKET == "your-source-gcs-bucket" or SINK_BUCKET == "your-sink-gcs-bucket":
print("Please set GCP_PROJECT_ID, GCP_SOURCE_BUCKET, and GCP_SINK_BUCKET environment variables or replace placeholder values.")
else:
create_and_run_gcs_to_gcs_transfer_job(
PROJECT_ID, SOURCE_BUCKET, SINK_BUCKET, JOB_DESCRIPTION
)