{"library":"nv-one-logger-training-telemetry","title":"NVIDIA One-Logger Training Telemetry","description":"The `nv-one-logger-training-telemetry` library provides tools for capturing and reporting training job telemetry data, integrating with the `one-logger` ecosystem. It enables standardized logging of metrics, hyperparameters, and system information for AI/ML training runs. The current version is 2.3.1 and it is part of the NVIDIA one-logger project, following its release cadence.","language":"python","status":"active","last_verified":"Mon May 18","install":{"commands":["pip install nv-one-logger-training-telemetry"],"cli":null},"imports":["from nvtelemetry.client import TelemetryClient","from nvtelemetry.config import TelemetryConfig"],"auth":{"required":false,"env_vars":[]},"quickstart":{"code":"from nvtelemetry.client import TelemetryClient\nfrom nvtelemetry.config import TelemetryConfig\nfrom datetime import datetime\nimport os\n\n# Example configuration (adjust as needed for actual usage)\n# In a real environment, project and run_id might be set via environment variables\n# or a more complex configuration management system.\n\n# Using os.environ.get for dynamic values, falling back to defaults for example\nproject_name = os.environ.get(\"ONE_LOGGER_PROJECT_NAME\", \"my_ml_project_example\")\nrun_id = os.environ.get(\"ONE_LOGGER_RUN_ID\", f\"run_{datetime.now().strftime('%Y%m%d%H%M%S')}\")\n\nconfig = TelemetryConfig(\n    project=project_name,\n    model=\"my_model_v1\",\n    run_id=run_id,\n    framework=\"pytorch\",\n    framework_version=\"1.13.1\",\n    container_image=\"nvcr.io/nvidia/pytorch:23.05-py3\",\n    tags={\n        \"experiment\": \"initial_test\",\n        \"dataset\": \"cifar10\"\n    },\n    mlflow_tracking_uri=os.environ.get(\"ONE_LOGGER_MLFLOW_TRACKING_URI\", \"\") # Example for MLflow backend\n)\n\ntry:\n    # Initialize the client. This will connect to the configured backend (if any).\n    with TelemetryClient(config=config) as client:\n        print(f\"Telemetry client initialized for project '{config.project}', run: {config.run_id}\")\n\n        # Log hyperparameters\n        client.log_hyperparameters(learning_rate=0.001, batch_size=32, epochs=10)\n        print(\"Logged hyperparameters.\")\n\n        # Log metrics over steps/epochs\n        for epoch in range(3):\n            train_loss = 0.5 - epoch * 0.05\n            val_loss = 0.6 - epoch * 0.08\n            accuracy = 0.7 + epoch * 0.03\n            client.log_metrics(step=epoch, train_loss=train_loss, val_loss=val_loss, accuracy=accuracy)\n            print(f\"Logged metrics for epoch {epoch}: train_loss={train_loss:.3f}, val_loss={val_loss:.3f}, accuracy={accuracy:.3f}\")\n\n        # Log an artifact path (this just records the path, not the artifact itself)\n        client.log_artifact_path(\"model_checkpoint\", \"/path/to/my_model_checkpoint.pt\")\n        print(\"Logged artifact path.\")\n\n        # Log a final message\n        client.log_message(\"Training run completed successfully.\")\n        print(\"Logged completion message.\")\n\nexcept Exception as e:\n    print(f\"An error occurred during telemetry logging: {e}\")\n    print(\"Note: In a real environment, TelemetryClient might require specific endpoint configuration or environment variables (e.g., ONE_LOGGER_MLFLOW_TRACKING_URI, ONE_LOGGER_NEMO_SERVICE_URL) to connect to a telemetry backend like MLflow or NVIDIA NeMo Service. This example primarily demonstrates the API usage, and may not send data to a remote service without proper setup.\")\n","lang":"python","description":"This quickstart demonstrates how to initialize `TelemetryClient` with a `TelemetryConfig` and log common training data such as hyperparameters, metrics over steps, and artifact paths. It includes example environment variable usage for configurability and handles potential connection errors for backend services.","tag":null,"tag_description":null,"last_tested":null,"results":[]},"compatibility":{"tag":null,"tag_description":null,"last_tested":"2026-05-18","installed_version":"2.3.1","pypi_latest":"2.3.1","is_stale":false,"summary":{"python_range":"3.10–3.9","success_rate":100,"avg_install_s":3.6,"avg_import_s":null,"wheel_type":"wheel"},"results":[{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"nv-one-logger-training-telemetry","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":"29.2M"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"nv-one-logger-training-telemetry","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":4.1,"import_time_s":null,"mem_mb":null,"disk_size":"29M"},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"nv-one-logger-training-telemetry","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":"32.0M"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"nv-one-logger-training-telemetry","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":3.4,"import_time_s":null,"mem_mb":null,"disk_size":"32M"},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"nv-one-logger-training-telemetry","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":"23.6M"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"nv-one-logger-training-telemetry","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":2.8,"import_time_s":null,"mem_mb":null,"disk_size":"23M"},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"nv-one-logger-training-telemetry","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":"23.4M"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"nv-one-logger-training-telemetry","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":2.9,"import_time_s":null,"mem_mb":null,"disk_size":"23M"},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"nv-one-logger-training-telemetry","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":"28.7M"},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"nv-one-logger-training-telemetry","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":4.9,"import_time_s":null,"mem_mb":null,"disk_size":"28M"}]}}