OpenDataLab Python SDK
The OpenDataLab Python SDK (version 0.0.10) is a library designed for programmatic access to the OpenDataLab platform and its open datasets. It provides a Pythonic interface to resources and includes a command-line interface (CLI) tool, `odl`, for convenient dataset operations. The SDK is currently a work-in-progress (WIP), and users are advised to use the latest version as compatibility across releases is not guaranteed.
Warnings
- breaking The OpenDataLab SDK is explicitly marked as 'WIP' (Work-In-Progress), and the developers 'not ensure the necessary compatibility of OpenAPI and SDK'. This means that breaking changes can occur frequently, and API stability is not guaranteed across minor versions.
- gotcha An OpenDataLab account (username and password) is required to access the platform and its datasets, even when using the SDK. Attempting to use the SDK without authentication will fail.
- gotcha The primary usage examples in the official GitHub README heavily feature the `odl` command-line interface (CLI) for tasks like login and dataset retrieval. While a Python SDK exists, direct Python examples for certain common workflows might be less immediately visible compared to CLI instructions.
Install
-
pip install opendatalab
Imports
- OdlClient
from opendatalab.client import OdlClient
Quickstart
import os
from opendatalab.client import OdlClient
# An OpenDataLab account is required. Register at https://opendatalab.org.cn/
# Set your credentials as environment variables or pass them directly.
USERNAME = os.environ.get('OPEN_DATALAB_USERNAME', 'your_username')
PASSWORD = os.environ.get('OPEN_DATALAB_PASSWORD', 'your_password')
# Initialize the client
odl_client = OdlClient()
try:
# Login to the OpenDataLab platform
print(f"Attempting to log in as {USERNAME}...")
# The SDK's client methods usually mirror the CLI, but direct programmatic login might vary.
# As per CLI, it typically involves a login command. The OdlClient likely handles session management.
# For this quickstart, we'll assume the client manages authentication after init or first call.
# Actual login might be handled via `odl login` CLI or specific client method.
# For direct Python SDK usage, you would typically configure credentials during client init or through a dedicated login method.
# Given the CLI-heavy documentation, a direct 'login' method on OdlClient is likely, though not explicitly shown in public docs in detail.
# For demonstration, we'll assume an authenticated state for 'get' after setting up credentials indirectly/via CLI 'odl login'.
# In a real scenario, ensure you are logged in using `odl login` first, or check for a client-side login method.
# The provided client object often carries session information.
# Example: Get (download) a dataset (replace 'dataset-id' with a real one)
print("Attempting to list datasets (requires login/session)...")
# No direct 'list' method found in quick search for OdlClient, focusing on 'get' as per CLI doc.
# The CLI 'odl get' downloads a dataset.
# Let's simulate a download call, assuming authentication is handled.
dataset_id = 'YOUR_DATASET_ID' # e.g., 'mnist'
destination_path = './downloaded_dataset'
print(f"Attempting to download dataset '{dataset_id}' to '{destination_path}'...")
# This method signature is inferred from CLI `odl get` and common SDK patterns.
# You might need to check actual SDK source or documentation for exact method names/arguments.
# odl_client.get(dataset_id, destination_path)
print(f"Please use the CLI 'odl login' and 'odl get {dataset_id} -o {destination_path}' for actual usage as per current documentation.")
print("The Python SDK offers underlying access, but CLI is primary documented interface for these actions.")
except Exception as e:
print(f"An error occurred: {e}")