PyDruid

0.6.9 · active · verified Fri Apr 10

PyDruid is a Python connector for Apache Druid, providing a simple API to create, execute, and analyze Druid queries. It can parse query results into Pandas DataFrame objects for seamless integration with the SciPy stack. The library offers both synchronous and asynchronous clients, implements the Python DB API 2.0, and includes a SQLAlchemy dialect. The current version is 0.6.9, and it maintains an active development and release cadence.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to establish a connection to Druid using the Python DB API 2.0 interface and execute a basic SQL query. Ensure your Druid cluster is running and accessible at the specified host and port. Connection parameters can be provided via environment variables DRUID_HOST, DRUID_PORT, DRUID_PATH, and DRUID_SCHEME.

import os
from pydruid.db import connect

# Configure connection details (replace with your Druid Broker/Router info)
DRUID_HOST = os.environ.get('DRUID_HOST', 'localhost')
DRUID_PORT = int(os.environ.get('DRUID_PORT', '8082'))
DRUID_PATH = os.environ.get('DRUID_PATH', '/druid/v2/sql/')
DRUID_SCHEME = os.environ.get('DRUID_SCHEME', 'http')

try:
    conn = connect(
        host=DRUID_HOST, 
        port=DRUID_PORT, 
        path=DRUID_PATH, 
        scheme=DRUID_SCHEME
    )
    curs = conn.cursor()
    curs.execute("SELECT COUNT(*) FROM wikipedia")
    result = curs.fetchone()
    print(f"Query result: {result}")
except Exception as e:
    print(f"Error connecting to Druid or executing query: {e}")

view raw JSON →