ODPS Python SDK and data analysis framework
PyODPS is the official Python SDK for Alibaba Cloud's MaxCompute (formerly ODPS), providing an elegant way to access MaxCompute APIs. It supports basic operations on MaxCompute objects and includes a DataFrame framework for streamlined data analysis. Currently at version 0.12.6, the library is actively maintained with a regular release cadence, adding new features, enhancements, and bug fixes.
Warnings
- breaking The class `odps.accounts.AliyunAccount` was renamed to `odps.account.CloudAccount`. Code directly importing or referencing the old name will break.
- breaking MaxCompute V4 signature is enabled by default, which may cause issues with services or environments that do not support it.
- breaking Decimal precision and scale checks at the client side have been tightened to align with MaxCompute server-side behavior. This might cause existing client-side checks to fail that previously passed.
- gotcha When using third-party packages in Python UDFs for MaxCompute, import statements for these packages must be placed *inside* the UDF's `evaluate` method (or similar processing method). Placing them at the module level will lead to runtime errors because the package is only available within the execution context on the MaxCompute server.
- gotcha PyODPS `execute_sql()` or `run_sql()` methods are primarily for DQL (Data Query Language) and DML (Data Manipulation Language). They may not correctly execute all SQL statement types, particularly DDL (Data Definition Language) commands like `CREATE TABLE` or complex API commands.
- gotcha Downloading large datasets entirely to a local machine using PyODPS can lead to Out-Of-Memory (OOM) errors, especially when dealing with MaxCompute's distributed nature.
Install
-
pip install pyodps -
pip install pyodps[full]
Imports
- ODPS
from odps import ODPS
- DataFrame
from odps.df import DataFrame
- options
from odps import options
Quickstart
import os
from odps import ODPS
# Ensure environment variables are set for security and best practice
# ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET
access_id = os.environ.get('ALIBABA_CLOUD_ACCESS_KEY_ID', 'your-access-id')
secret_access_key = os.environ.get('ALIBABA_CLOUD_ACCESS_KEY_SECRET', 'your-secret-access-key')
project = os.environ.get('ODPS_PROJECT', 'your-project')
endpoint = os.environ.get('ODPS_ENDPOINT', 'your-endpoint')
# Initialize ODPS object
o = ODPS(access_id, secret_access_key, project=project, endpoint=endpoint)
# Get a table object
table = o.get_table('dual')
# Print table schema details
print(f"Table Name: {table.name}")
print(f"Table Schema: {table.table_schema}")
print("First 5 records:")
with table.open_reader() as reader:
for record in reader.read(5):
print(record)