Google Cloud Dataproc Metastore API client library
raw JSON → 1.21.0 verified Tue May 12 auth: no python install: verified quickstart: verified
Google Cloud Dataproc Metastore is a fully managed, highly available, autohealing, and serverless Apache Hive metastore (HMS) that runs on Google Cloud. It simplifies technical metadata management for data lakes and provides interoperability between various data processing engines like Apache Hive, Apache Spark, and Presto. The `google-cloud-dataproc-metastore` Python client library allows developers to programmatically interact with this service. This library is part of the broader `google-cloud-python` monorepo, which typically sees frequent releases, often weekly for various client libraries.
pip install google-cloud-dataproc-metastore Common errors
error ModuleNotFoundError: No module named 'google.cloud.dataproc_metastore' ↓
cause The `google-cloud-dataproc-metastore` library is not installed in your Python environment, or there is an issue with your Python path.
fix
Install the library using pip:
pip install google-cloud-dataproc-metastore error google.api_core.exceptions.NotFound: 404 Not Found: projects/PROJECT_ID/locations/LOCATION/services/SERVICE_ID not found. ↓
cause The specified Google Cloud Dataproc Metastore service, or another related resource like a backup or import source, could not be found. This often indicates an incorrect project ID, location, service ID, or that the resource does not exist.
fix
Verify that the
project_id, location, and service_id in your code or gcloud command accurately match an existing Dataproc Metastore service. Ensure the resource has been created successfully and is in the correct region. error The Dataproc Metastore service agent [SERVICE_AGENT] does not have sufficient IAM permissions to access the network [NETWORK]. ↓
cause The Dataproc Metastore service agent (a Google-managed service account) or the caller's identity lacks the necessary Identity and Access Management (IAM) permissions to perform the requested operation, often related to network access or Cloud Storage buckets.
fix
Grant the Dataproc Metastore service agent (e.g.,
service-PROJECT_NUMBER@gcp-sa-metastore.iam.gserviceaccount.com) or the user/service account running the operation the required IAM roles, such as roles/metastore.serviceAgent in the project or roles/datametastore.user on the Metastore instance, and appropriate Cloud Storage permissions if accessing GCS buckets. error Unable to connect to Hive Metastore (or Connection refused, Host unreachable, Timeout errors) ↓
cause This error typically occurs when a Dataproc cluster or another client cannot establish a network connection to the Dataproc Metastore service. Common reasons include misconfigured VPC Network Peering, incorrect firewall rules, or an improperly specified Metastore endpoint URI.
fix
Verify the VPC Network Peering connection between your Dataproc workload's network and the service producer network is active. Check firewall rules to ensure outbound traffic from your Dataproc workload to Metastore's port (default 9083) is allowed. Confirm the
hive.metastore.uris Spark property or equivalent configuration uses the correct Dataproc Metastore endpoint URI. error Current state of resource [RESOURCE_NAME] is not a valid state for this operation. Valid state(s) are [RESOURCE_STATE]. ↓
cause You are attempting to perform an operation (e.g., update, import, export, backup, restore) on a Dataproc Metastore service, backup, or import that is not in the required state (e.g., `ACTIVE`).
fix
Wait for the Dataproc Metastore service or associated resource to reach the
ACTIVE or appropriate valid state before attempting the operation. You can check the resource's state in the Google Cloud Console or via the gcloud CLI. Warnings
gotcha Dataproc Metastore offers two service versions: Dataproc Metastore 1 and Dataproc Metastore 2. Version 2 provides horizontal scalability and has a different pricing model. When creating or configuring services, ensure you are aware of which version you intend to use as it impacts features and cost. ↓
fix Review the Dataproc Metastore documentation on 'Dataproc Metastore versions' and 'features and benefits' to understand the differences and choose the appropriate service version during creation.
breaking Incompatible Dataproc or Hive Metastore versions can lead to issues. Specifically, Dataproc 3.x versions are incompatible with Dataproc Metastore. Using Dataproc 1.5 with Dataproc Metastore 3.1.2 may also result in backward compatibility problems. ↓
fix Always check the 'Dataproc Metastore version support' documentation for supported Hive patch versions and Dataproc compatibility. For Dataproc 1.5 with Metastore 3.1.2, consider using the auxiliary versions feature.
gotcha Dataproc Metastore services can expose either Apache Thrift or gRPC endpoints. While Thrift is widely used, gRPC is often recommended for integration with newer Google Cloud services like Dataplex. The chosen endpoint protocol must match how clients connect to the service. ↓
fix Specify the `--endpoint-protocol` flag (e.g., `grpc` or `thrift`) when creating your Dataproc Metastore service via `gcloud` or equivalent client library methods. Ensure your connecting applications are configured to use the corresponding protocol.
gotcha Proper authentication is critical for connecting to Google Cloud services. A common footgun is forgetting to set up Application Default Credentials or providing appropriate IAM roles for the service account/user. ↓
fix Ensure you have authenticated via `gcloud auth application-default login` locally, or that the `GOOGLE_APPLICATION_CREDENTIALS` environment variable points to a valid service account key file in production. Grant the necessary IAM roles (e.g., `roles/metastore.editor` or `roles/metastore.admin`) to your principal.
gotcha Google Cloud client libraries often require the `GOOGLE_CLOUD_PROJECT` environment variable to be set, or the project ID to be explicitly provided in code, to identify which project operations should be performed against. Failure to set it can result in errors when interacting with services like Dataproc Metastore. ↓
fix Ensure the `GOOGLE_CLOUD_PROJECT` environment variable is set to your Google Cloud project ID, or explicitly pass the project ID when initializing client libraries or making API calls.
gotcha When interacting with Google Cloud services, the `GOOGLE_CLOUD_PROJECT` environment variable must be set, or the project ID must be explicitly provided in your application code. This variable specifies which Google Cloud project the client library should operate on. ↓
fix Set the `GOOGLE_CLOUD_PROJECT` environment variable to your Google Cloud Project ID (e.g., `export GOOGLE_CLOUD_PROJECT='your-project-id'`) before running your application, or ensure your client library initialization explicitly specifies the project ID.
Install compatibility verified last tested: 2026-05-12
python os / libc status wheel install import disk
3.10 alpine (musl) wheel - 1.99s 74.3M
3.10 alpine (musl) - - 1.85s 73.1M
3.10 slim (glibc) wheel 6.5s 1.13s 72M
3.10 slim (glibc) - - 1.07s 71M
3.11 alpine (musl) wheel - 2.45s 79.7M
3.11 alpine (musl) - - 2.83s 78.6M
3.11 slim (glibc) wheel 5.5s 1.71s 77M
3.11 slim (glibc) - - 1.62s 76M
3.12 alpine (musl) wheel - 2.58s 71.1M
3.12 alpine (musl) - - 2.75s 70.0M
3.12 slim (glibc) wheel 4.7s 1.92s 69M
3.12 slim (glibc) - - 2.35s 68M
3.13 alpine (musl) wheel - 2.42s 70.7M
3.13 alpine (musl) - - 2.80s 69.5M
3.13 slim (glibc) wheel 4.9s 1.89s 69M
3.13 slim (glibc) - - 2.30s 67M
3.9 alpine (musl) wheel - 1.72s 74.4M
3.9 alpine (musl) - - 1.67s 73.4M
3.9 slim (glibc) wheel 7.3s 1.33s 72M
3.9 slim (glibc) - - 1.15s 71M
Imports
- DataprocMetastoreClient
from google.cloud.metastore_v1.services.dataproc_metastore import DataprocMetastoreClient - MetastoreService
from google.cloud.metastore_v1.types import MetastoreService - ListServicesRequest
from google.cloud.metastore_v1.types import ListServicesRequest
Quickstart verified last tested: 2026-04-24
import os
from google.cloud.metastore_v1.services.dataproc_metastore import DataprocMetastoreClient
from google.cloud.metastore_v1.types import ListServicesRequest
def list_metastore_services(project_id: str, location: str) -> None:
"""Lists Dataproc Metastore services in a given project and location.
Args:
project_id: Your Google Cloud project ID.
location: The Google Cloud location (e.g., 'us-central1').
"""
# Instantiates a client
client = DataprocMetastoreClient()
# The resource name of the location where the services are located.
# Example: "projects/my-project/locations/us-central1"
parent = f"projects/{project_id}/locations/{location}"
# Construct the request
request = ListServicesRequest(parent=parent)
# Call the API
try:
page_result = client.list_services(request=request)
print(f"Dataproc Metastore services in {parent}:")
found_services = False
for service in page_result:
print(f"- {service.name} (State: {service.state.name})")
found_services = True
if not found_services:
print(" No Dataproc Metastore services found.")
except Exception as e:
print(f"Error listing services: {e}")
print("Ensure the API is enabled, credentials are set, and the location is valid.")
# To run this quickstart:
# 1. Ensure `gcloud auth application-default login` has been run or `GOOGLE_APPLICATION_CREDENTIALS` is set.
# 2. Set the `GOOGLE_CLOUD_PROJECT` environment variable to your project ID.
# 3. Set the `GOOGLE_CLOUD_LOCATION` environment variable to your desired location (e.g., "us-central1").
# Example usage:
# GOOGLE_CLOUD_PROJECT='your-project-id' GOOGLE_CLOUD_LOCATION='us-central1' python your_script_name.py
if __name__ == "__main__":
project_id = os.environ.get("GOOGLE_CLOUD_PROJECT", "")
location = os.environ.get("GOOGLE_CLOUD_LOCATION", "")
if not project_id:
print("Please set the GOOGLE_CLOUD_PROJECT environment variable.")
elif not location:
print("Please set the GOOGLE_CLOUD_LOCATION environment variable.")
else:
list_metastore_services(project_id, location)