pytd - Treasure Data Driver
pytd is the official Treasure Data Driver for Python, providing a client for interacting with Treasure Data's services. It allows users to query data, load data into tables, and manage databases. The current version is 2.4.0, and it maintains a regular release cadence, typically with several updates per year.
Warnings
- breaking Python 3.8 support was dropped in `pytd` version 2.0.0. Users on Python 3.8 or older must upgrade their Python environment to 3.9+ (3.10+ recommended) to use `pytd >= 2.0.0`.
- breaking The underlying SQL client for Trino/Presto queries migrated from `presto-python-client` to `trino-python-client` in `pytd` version 2.0.0. While `pytd` abstracts this, direct interaction with the underlying driver or specific environments with `presto-python-client` dependencies might be affected.
- breaking Version 2.4.0 introduced increased minimum version requirements for core dependencies: `td-client >= 1.7.0`, `urllib3 >= 2.0.0`, and `pyarrow >= 14.0.1`. Installing `pytd 2.4.0` with older pinned versions of these dependencies will likely cause conflicts.
- gotcha `pytd` added support for Pandas 2.x in version 1.7.0. If using Pandas DataFrames with `pytd` versions older than 1.7.0, you might encounter compatibility issues with Pandas 2.x. It's recommended to upgrade `pytd` when working with recent Pandas versions.
- gotcha For large data imports using `Client.load_table_from_dataframe` or `BulkImportWriter`, `pytd` offers performance features like parallel uploads (`max_workers`, `chunk_record_size`) introduced in 1.7.0/1.9.0. Failing to utilize these parameters can lead to significantly slower uploads.
Install
-
pip install pytd
Imports
- Client
import pytd
from pytd.client import Client
Quickstart
import os
from pytd.client import Client
# Set your Treasure Data API key and server in environment variables
# e.g., export TD_API_KEY="YOUR_API_KEY"
# e.g., export TD_API_SERVER="https://api.treasuredata.com"
api_key = os.environ.get('TD_API_KEY', '')
api_server = os.environ.get('TD_API_SERVER', 'https://api.treasuredata.com')
if not api_key:
print("Warning: TD_API_KEY environment variable not set. Client may fail to authenticate.")
try:
# Initialize the client
client = Client(apikey=api_key, endpoint=api_server)
# List databases
databases = client.list_databases()
print(f"Databases: {[db.name for db in databases]}")
# Example: Execute a simple query (replace 'sample_datasets' and 'www_access' with your actual db/table)
# This part requires a valid database and table to exist in your Treasure Data account.
if databases and len(databases) > 0:
first_db_name = databases[0].name
print(f"Using first database found: {first_db_name}")
# Example query (uncomment and modify with an actual table in your account):
# query_result = client.query(f'SELECT count(*) FROM {first_db_name}.your_table_name_here')
# print(f"Query result count: {query_result.fetchall()[0][0]}")
else:
print("No databases found or API key/server invalid.")
except Exception as e:
print(f"An error occurred: {e}")
print("Please ensure your TD_API_KEY and TD_API_SERVER are correctly set and accessible, and you have network connectivity.")