Google Cloud Batch
The `google-cloud-batch` Python client library provides programmatic access to the Google Cloud Batch API, a fully managed service for running batch jobs at scale. It simplifies the orchestration of high-performance computing (HPC), AI/ML, and data processing workloads by handling infrastructure provisioning, scheduling, execution, and cleanup. The library is currently at version 0.20.0 and is part of the `google-cloud-python` monorepo, which typically sees frequent releases.
Warnings
- breaking As a pre-GA (0.x.x) client library, the API surface and underlying RPCs of `google-cloud-batch` are subject to backward-incompatible changes without a major version bump. This means updates might introduce breaking changes to existing code.
- gotcha Authentication with Google Cloud client libraries often relies on Application Default Credentials (ADC). Hardcoding service account key JSON files directly into applications is a common anti-pattern and security risk.
- gotcha Batch job creation can fail due to insufficient IAM permissions (e.g., `iam.serviceAccounts.actAs`) for the service account used by the job or due to insufficient resource quotas in the specified region.
- gotcha Jobs might fail if they specify Compute Engine (or custom) VM OS images with outdated kernels. This can lead to unexpected job failures.
- gotcha The client library's internal logging can be verbose and may contain sensitive information. By default, logging events from the library are not handled.
Install
-
pip install google-cloud-batch
Imports
- BatchServiceClient
from google.cloud import batch_v1 client = batch_v1.BatchServiceClient()
- Job
from google.cloud.batch_v1 import types job = types.Job(...)
Quickstart
import os
from google.cloud import batch_v1
from google.cloud.batch_v1 import types
def create_simple_container_job(
project_id: str,
region: str,
job_name: str,
) -> types.Job:
"""Creates and runs a simple container job in Google Cloud Batch."""
client = batch_v1.BatchServiceClient()
# Define what will be done as part of the job.
runnable = types.Runnable()
runnable.container = types.Runnable.Container(
image_uri="gcr.io/google-containers/busybox",
entrypoint="/bin/sh",
commands=[
"-c",
"echo Hello world! This is task ${BATCH_TASK_INDEX}. This job has a total of ${BATCH_TASK_COUNT} tasks.",
],
)
# Jobs can be divided into tasks. In this case, we have one task group with one task.
task_spec = types.TaskSpec(runnables=[runnable])
task_group = types.TaskGroup(
task_spec=task_spec,
task_count=1,
parallelism=1,
)
# Policies for VM allocation.
# Using a general purpose machine type like 'e2-standard-4'.
# Ensure the specified region supports the machine type.
allocation_policy = types.AllocationPolicy(
instances=[
types.AllocationPolicy.InstancePolicyOrTemplate(
policy=types.AllocationPolicy.InstancePolicy(machine_type="e2-standard-4")
),
],
location=types.AllocationPolicy.LocationPolicy(
allowed_locations=[f"regions/{region}"]
)
)
# Define the job itself.
job = types.Job(
name=job_name, # Name needs to be unique per project and region
task_groups=[task_group],
allocation_policy=allocation_policy,
labels={
"environment": "dev",
"framework": "batch-quickstart",
},
logs_policy=types.LogsPolicy(destination=types.LogsPolicy.Destination.CLOUD_LOGGING),
)
request = types.CreateJobRequest(
parent=f"projects/{project_id}/locations/{region}",
job_id=job_name,
job=job,
)
response = client.create_job(request=request)
print(f"Job created: {response.name}")
return response
if __name__ == "__main__":
project_id = os.environ.get("GOOGLE_CLOUD_PROJECT", "your-gcp-project-id")
region = os.environ.get("GOOGLE_CLOUD_REGION", "us-central1") # Choose an available region
job_id = os.environ.get("BATCH_JOB_ID", "my-sample-batch-job-1") # Unique ID for the job
if project_id == "your-gcp-project-id":
print("Please set the GOOGLE_CLOUD_PROJECT environment variable or replace 'your-gcp-project-id'.")
elif region == "us-central1":
print("Consider setting the GOOGLE_CLOUD_REGION environment variable or choose a different region.")
else:
try:
created_job = create_simple_container_job(project_id, region, job_id)
print(f"Monitor job in console: https://console.cloud.google.com/batch/jobs/{region}/{job_id}?project={project_id}")
except Exception as e:
print(f"Error creating job: {e}")
print("Ensure the Batch API is enabled and your service account has 'Batch Job Editor' (roles/batch.jobs.editor) or equivalent permissions.")