{"id":8202,"library":"google-cloud-mldiagnostics","title":"ML Diagnostics Python SDK","description":"The `google-cloud-mldiagnostics` library is the Python SDK for Google Cloud's ML Diagnostics platform. It integrates with machine learning workloads to collect and manage workload metrics, configurations, and profiles, and enables programmatic and on-demand profile capture. It helps users create and monitor machine learning runs, deploy managed XProf resources for performance profiling, and visualize various workload aspects on Google Cloud. The library is actively maintained, with frequent updates aligning with new features and improvements in Google Cloud services.","status":"active","version":"1.0.2","language":"en","source_language":"en","source_url":"https://github.com/GoogleCloudPlatform/google-cloud-python","tags":["Google Cloud","ML Diagnostics","Machine Learning","Profiling","Metrics","XProf","Python"],"install":[{"cmd":"pip install google-cloud-mldiagnostics","lang":"bash","label":"Install latest version"},{"cmd":"pip install google-cloud-logging","lang":"bash","label":"Install recommended logging dependency"}],"dependencies":[{"reason":"Recommended for routing SDK logs, metrics, and configs to Cloud Logging for visualization and analysis.","package":"google-cloud-logging","optional":false}],"imports":[{"symbol":"machinelearning_run","correct":"from google_cloud_mldiagnostics import machinelearning_run"},{"symbol":"metrics","correct":"from google_cloud_mldiagnostics import metrics"},{"symbol":"xprof","correct":"from google_cloud_mldiagnostics import xprof"},{"symbol":"MLRun","correct":"from google_cloud_mldiagnostics.proto.diagnostics import MLRun"},{"symbol":"MetricType","correct":"from google_cloud_mldiagnostics.proto.diagnostics import MetricType"}],"quickstart":{"code":"import os\nimport logging\nimport google.cloud.logging\nfrom google_cloud_mldiagnostics import machinelearning_run\nfrom google_cloud_mldiagnostics import metrics\nfrom google_cloud_mldiagnostics import xprof\nfrom google_cloud_mldiagnostics.proto.diagnostics import MetricType\n\n# Set up Cloud Logging (recommended)\nlogging_client = google.cloud.logging.Client()\nlogging_client.setup_logging()\nlogging.info(\"Cloud Logging is set up.\")\n\nPROJECT_ID = os.environ.get('GCP_PROJECT_ID', 'your-gcp-project-id')\n# Ensure GOOGLE_APPLICATION_CREDENTIALS is set or authenticated via gcloud CLI\n\ndef run_ml_diagnostics_example():\n    print(f\"Using GCP Project ID: {PROJECT_ID}\")\n    \n    # 1. Create a machine learning run\n    # The SDK automatically generates a unique run_id if not provided.\n    run_name = machinelearning_run.create_run(\n        project_id=PROJECT_ID,\n        experiment_name=\"my-first-experiment\",\n        display_name=\"my-training-run\"\n    )\n    print(f\"Created ML Run: {run_name}\")\n\n    # 2. Record metrics\n    metrics.record(MetricType.LOSS, 0.5, step=1, run_name=run_name)\n    metrics.record(MetricType.ACCURACY, 0.8, step=1, run_name=run_name)\n    print(\"Recorded initial metrics.\")\n\n    metrics.record(MetricType.LOSS, 0.2, step=10, run_name=run_name)\n    metrics.record(MetricType.ACCURACY, 0.95, step=10, run_name=run_name)\n    print(\"Recorded updated metrics.\")\n\n    # 3. Write configurations (example)\n    machinelearning_run.write_config(run_name, {\"learning_rate\": 0.01, \"batch_size\": 32})\n    print(\"Wrote run configurations.\")\n\n    # Example of capturing a profile (requires XProf server running in your workload)\n    # For on-demand capture, ensure xprof.start_server() is called in your ML workload.\n    # xprof.capture_profile(run_name, 'gs://your-bucket/profiles', duration_ms=10000)\n    # print(\"Attempted to capture profile.\")\n\n    print(\"ML Diagnostics example completed. Check Google Cloud Console for 'my-training-run'.\")\n\nif __name__ == '__main__':\n    run_ml_diagnostics_example()","lang":"python","description":"This quickstart demonstrates how to integrate `google-cloud-mldiagnostics` into your Python ML workload. It sets up Cloud Logging, creates a machine learning run, records sample metrics, and writes configuration data. Ensure `GCP_PROJECT_ID` environment variable is set or replace `'your-gcp-project-id'` with your actual Google Cloud project ID. Authentication typically relies on Application Default Credentials (e.g., via `gcloud auth application-default login`)."},"warnings":[{"fix":"Migrate from `google-cloud` to `pip install google-cloud-mldiagnostics` and other specific `google-cloud-*` libraries. Remove `google-cloud` from your project's dependencies.","message":"The generic `google-cloud` package is deprecated. Users should install product-specific packages like `google-cloud-mldiagnostics` instead of the umbrella package to avoid issues and ensure up-to-date functionality.","severity":"breaking","affected_versions":"<=0.34.0 of google-cloud"},{"fix":"Verify compatibility with your specific ML framework and hardware setup. Refer to the official documentation for the latest support matrix.","message":"The ML Diagnostics SDK for Python currently only officially supports JAX on TPUs. Using it with other frameworks (e.g., TensorFlow, PyTorch) or hardware (e.g., GPUs, CPUs) might not be fully supported or may have limitations.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Install `google-cloud-logging` and add `import google.cloud.logging; logging_client = google.cloud.logging.Client(); logging_client.setup_logging()` to your script as shown in the quickstart.","message":"To route SDK logs, metrics, and configuration information to Google Cloud Logging, you must explicitly install and configure the `google-cloud-logging` library in your application. Without this, SDK output will only go to standard Python logging, not Cloud Logging.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure your environment is properly authenticated. The recommended approach is to use Application Default Credentials (ADC) by running `gcloud auth application-default login` or setting the `GOOGLE_APPLICATION_CREDENTIALS` environment variable to the path of a service account key file.","message":"Authentication to Google Cloud services is required. Incorrect or missing authentication credentials (e.g., `GOOGLE_APPLICATION_CREDENTIALS` not set, or `gcloud auth application-default login` not run) will lead to permission errors when the SDK attempts to interact with Google Cloud APIs.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Ensure `GOOGLE_APPLICATION_CREDENTIALS` environment variable points to a valid service account key JSON file with appropriate roles (e.g., 'ML Diagnostics Editor' or 'Owner') for the project, or run `gcloud auth application-default login` and ensure the authenticated user has sufficient permissions.","cause":"The Python environment running the code does not have the necessary Google Cloud authentication credentials or the authenticated principal lacks permissions to create/manage ML Diagnostics resources in the specified project.","error":"google.api_core.exceptions.PermissionDenied: 403 Permission denied to access project..."},{"fix":"Install the library using `pip install google-cloud-mldiagnostics`.","cause":"The `google-cloud-mldiagnostics` library is not installed in the Python environment.","error":"ModuleNotFoundError: No module named 'google_cloud_mldiagnostics'"},{"fix":"Always pass the `project_id` parameter to `create_run()`. It's recommended to retrieve it from an environment variable like `GCP_PROJECT_ID` or explicitly provide it, e.g., `machinelearning_run.create_run(project_id='your-project-id', ...)`.","cause":"When calling `machinelearning_run.create_run()`, the `project_id` argument was not provided. All ML Diagnostics operations require a target Google Cloud project.","error":"TypeError: create_run() missing 1 required positional argument: 'project_id'"}]}