NVIDIA GPU Cloud SDK
The NVIDIA GPU Cloud (NGC) SDK is a Python library that enables developers to integrate their applications and utilities with NVIDIA GPU Cloud. It provides programmatic access to NGC services, allowing users to manage resources, run jobs, and interact with the NGC catalog. The library is currently at version 4.17.0 and sees active development with frequent releases.
Common errors
-
Url: https://authn.nvidia.com/token?service=ngc& is not reachable.
cause The NGC authentication endpoint is unreachable, often due to network restrictions, firewall rules, or incorrect proxy settings.fixCheck your internet connectivity. If you are behind a corporate proxy, set the `HTTPS_PROXY` and `HTTP_PROXY` environment variables to your proxy server address (e.g., `export HTTPS_PROXY="http://your.proxy.com:port"`). -
socket.gaierror: [Errno -2] Name or service not known
cause This error indicates a DNS resolution failure, meaning the system cannot resolve the hostname of NGC services to an IP address.fixVerify your DNS configuration. Ensure your system can resolve external hostnames. This can also be caused by proxy issues; try the `HTTPS_PROXY` fix mentioned above if applicable. -
ERROR: Wheel 'ngcsdk' located at ... is invalid. OR packaging.requirements.InvalidRequirement: Invalid specifier: '~=0.1.52'
cause These can indicate issues with the downloaded wheel file (e.g., corrupted, incompatible architecture) or, more commonly, dependency conflicts arising from other installed libraries (e.g., `langchain-core` requiring a specific version of `packaging` that conflicts with `ngcsdk`'s transitive dependencies).fixFor invalid wheel errors, ensure you are installing `ngcsdk` directly via `pip install ngcsdk` rather than from a manually downloaded wheel, and that your Python environment is healthy. For dependency conflicts, try installing `ngcsdk` in a clean virtual environment first. If conflicts persist with other libraries, inspect dependency trees (`pip show ngcsdk` and `pipdeptree`) to identify the conflicting packages and adjust versions accordingly.
Warnings
- gotcha The `ngcsdk` client configuration (`clt.configure()`) writes settings to user files. Ensure API keys are handled securely and not hardcoded. For automated environments, prefer using environment variables.
- gotcha Network connectivity issues, corporate proxies, or incorrect DNS configurations can lead to 'URL not reachable' or 'Name or service not known' errors during SDK operations, especially during client configuration or authentication with NGC services.
- gotcha While not explicitly called out as a breaking change for the SDK itself, the NGC ecosystem frequently updates. NVIDIA recommends using specific version tags for NGC containers in production rather than `:latest`. This best practice extends to the SDK; consider pinning `ngcsdk` versions to avoid unexpected behavior changes.
Install
-
pip install ngcsdk
Imports
- Client
from ngcsdk import Client
Quickstart
import os
from ngcsdk import Client
# Retrieve API key from environment variable for security and flexibility
NGC_API_KEY = os.environ.get('NGC_API_KEY', 'YOUR_NGC_API_KEY_HERE') # Replace with your actual key or set env var
if not NGC_API_KEY or NGC_API_KEY == 'YOUR_NGC_API_KEY_HERE':
print("Warning: NGC_API_KEY environment variable not set or placeholder used. Please provide a valid API key from ngc.nvidia.com/setup.")
exit(1)
try:
# Initialize the NGC SDK client
clt = Client()
# Configure the client. This will set user settings for future operations.
# Replace 'my_org' and 'my_team' with your actual NGC organization and team names.
# 'ace_name' can often be 'no-ace' or a specific ACE if your organization uses one.
clt.configure(api_key=NGC_API_KEY, org_name='nvidia', team_name='no-team', ace_name='no-ace')
print("Successfully configured NGC SDK client.")
# Verify current configuration
current_config = clt.current_config()
print("Current NGC configuration:")
for item in current_config:
print(f" {item['key']}: {item['value']} (Source: {item['source']})")
# Example: List available ACEs (Accelerated Compute Environments)
# This assumes 'basecommand' is available and configured.
# Note: Access to specific commands like 'aces' depends on your NGC permissions and setup.
# try:
# aces_info = clt.basecommand.aces.list()
# print("\nAvailable ACEs:")
# for ace in aces_info.values():
# print(f" - {ace['name']}")
# except AttributeError:
# print("\nCould not access 'aces' command. Check permissions or client setup.")
except Exception as e:
print(f"An error occurred: {e}")