{"id":8899,"library":"cloudml-hypertune","title":"CloudML HyperTune","description":"Cloudml-hypertune is a lightweight Python library providing helper functions to report hyperparameter tuning metrics to Google Cloud's Vertex AI (formerly Cloud ML Engine). It enables the hyperparameter tuning service to track and optimize model training trials by collecting objective metrics. Despite its `0.1.0.dev6` version being quite old (last released December 2019), it remains the standard way to report custom metrics for hyperparameter tuning on Google Cloud.","status":"active","version":"0.1.0.dev6","language":"en","source_language":"en","source_url":"http://github.com/GoogleCloudPlatform/cloudml-hypertune","tags":["machine-learning","google-cloud","hyperparameter-tuning","vertex-ai","cloud-ml-engine"],"install":[{"cmd":"pip install cloudml-hypertune","lang":"bash","label":"Install latest version"}],"dependencies":[],"imports":[{"note":"The primary class 'HyperTune' is typically accessed via the top-level 'hypertune' module, not directly imported from 'cloudml_hypertune'.","wrong":"from cloudml_hypertune import HyperTune","symbol":"HyperTune","correct":"import hypertune\nhpt = hypertune.HyperTune()"}],"quickstart":{"code":"import hypertune\nimport argparse\nimport os\n\ndef train_model(learning_rate, num_epochs, metric_tag):\n    # Simulate model training with hyperparameters\n    # In a real scenario, this would be your ML training loop\n    print(f\"Training with learning_rate={learning_rate}, num_epochs={num_epochs}\")\n    \n    # Simulate a metric, e.g., validation accuracy\n    # In a real scenario, you'd get this from your model's evaluation\n    metric_value = 0.5 + (learning_rate * 0.1) + (num_epochs * 0.01)\n    \n    # Report the metric to Cloud AI Platform / Vertex AI\n    hpt = hypertune.HyperTune()\n    hpt.report_hyperparameter_tuning_metric(\n        hyperparameter_metric_tag=metric_tag, # Must match config.yaml objective metricTag\n        metric_value=metric_value,\n        global_step=num_epochs # Or current training step\n    )\n    print(f\"Reported metric '{metric_tag}': {metric_value} at step {num_epochs}\")\n\nif __name__ == '__main__':\n    parser = argparse.ArgumentParser()\n    # Define hyperparameters as command-line arguments\n    parser.add_argument(\n        '--learning_rate',\n        type=float,\n        default=0.01,\n        help='Learning rate for training.'\n    )\n    parser.add_argument(\n        '--num_epochs',\n        type=int,\n        default=10,\n        help='Number of epochs for training.'\n    )\n    parser.add_argument(\n        '--metric_tag',\n        type=str,\n        default='accuracy',\n        help='Tag for the metric reported to HyperTune.'\n    )\n    args = parser.parse_args()\n    \n    train_model(args.learning_rate, args.num_epochs, args.metric_tag)\n","lang":"python","description":"This quickstart demonstrates how to use `cloudml-hypertune` within a training script to report a metric. The script accepts hyperparameters as command-line arguments, which is a critical requirement for Google Cloud's hyperparameter tuning service to inject trial-specific values. The `HyperTune` instance then reports the objective metric and the current training step. You would run this script within a container on Vertex AI/Cloud ML Engine."},"warnings":[{"fix":"Ensure your training job is submitted to Vertex AI (or deprecated Cloud ML Engine) with a hyperparameter tuning configuration.","message":"The `cloudml-hypertune` library is specifically designed to work within Google Cloud's hyperparameter tuning services (Vertex AI/AI Platform). It is not a standalone hyperparameter tuning framework and will not function as such outside of this cloud environment for driving optimization.","severity":"gotcha","affected_versions":"All"},{"fix":"Modify your training script to parse hyperparameters as `argparse` arguments, ensuring they are named consistently with your tuning job configuration.","message":"Hyperparameters you wish to tune *must* be exposed as command-line arguments in your training script. The Vertex AI tuning service passes trial-specific hyperparameter values via these arguments, not directly to the `hypertune` library.","severity":"breaking","affected_versions":"All"},{"fix":"Carefully verify that the metric tag string in your code is identical to the one in your Vertex AI hyperparameter tuning job configuration.","message":"The `hyperparameter_metric_tag` passed to `hpt.report_hyperparameter_tuning_metric` must exactly match the `metric_id` (or `hyperparameterMetricTag` in older configs) specified in your Vertex AI hyperparameter tuning job configuration. A mismatch will result in trials failing to report metrics or the tuning job not recognizing the objective.","severity":"gotcha","affected_versions":"All"},{"fix":"No direct fix needed, but be aware of its static nature. If encountering unexpected behavior with bleeding-edge Python or ML frameworks, consider if the issue is external to `cloudml-hypertune`.","message":"Version `0.1.0.dev6` was released in December 2019. While still functional and referenced in Google Cloud documentation for Vertex AI, its lack of recent updates may lead to assumptions of abandonment or compatibility issues with very new Python features, though it's a very stable, minimal utility.","severity":"gotcha","affected_versions":"All"},{"fix":"Implement robust error handling and numeric stability checks (e.g., clipping gradients, checking for `NaN`s) in your training code. Ensure `report_hyperparameter_tuning_metric` is called even if training terminates early due to an error, or within a `finally` block if a fallback metric can be determined.","message":"Training runs that result in `NaN` values in loss functions or other unhandled exceptions will cause hyperparameter tuning trials to fail. This not only wastes resources but also prevents the tuning algorithm from learning from that trial.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Verify that the `hyperparameter_metric_tag` used in `hpt.report_hyperparameter_tuning_metric()` exactly matches the `metric_id` specified in your Vertex AI hyperparameter tuning job configuration. Also ensure the `report_hyperparameter_tuning_metric` call is actually reached and executed during training.","cause":"The hyperparameter tuning service could not retrieve the objective metric from the training job. This often happens due to a mismatch between the `hyperparameter_metric_tag` in your code and the metric configuration in Vertex AI.","error":"Hyperparameter tuning failed"},{"fix":"Ensure `hpt.report_hyperparameter_tuning_metric` is called with a valid `metric_value` and `hyperparameter_metric_tag` at the end of each evaluation step or epoch. For non-TensorFlow models or custom evaluation loops, `cloudml-hypertune` is the definitive way to report.","cause":"The training code might be completing without explicitly reporting a metric, or the metric is being reported incorrectly. Older TensorFlow Estimator-based training might not be correctly outputting to event files in a format recognized by the tuning service if `cloudml-hypertune` isn't used.","error":"Trials show status 'Failed' but logs don't show Python errors"},{"fix":"Debug your training script to ensure `hpt.report_hyperparameter_tuning_metric` is called. Add logging around this call to confirm its execution and the values being passed. Double-check the `hyperparameter_metric_tag` against your Vertex AI job configuration.","cause":"The `cloudml-hypertune` library's `report_hyperparameter_tuning_metric` function was not invoked or did not successfully transmit the metric. This could be due to an uncaught exception in the training code preventing the call, or an incorrect metric tag.","error":"Google Cloud ML Engine does not return objective values when hyperparameter tuning"}]}