PyDruid
PyDruid is a Python connector for Apache Druid, providing a simple API to create, execute, and analyze Druid queries. It can parse query results into Pandas DataFrame objects for seamless integration with the SciPy stack. The library offers both synchronous and asynchronous clients, implements the Python DB API 2.0, and includes a SQLAlchemy dialect. The current version is 0.6.9, and it maintains an active development and release cadence.
Warnings
- breaking For Druid SQL versions >= 0.13.0, to ensure correct column name inference, especially with empty result sets, you should set `header=true` in the query context. PyDruid defaults to `header=false` for backward compatibility, which can lead to issues inferring column names.
- breaking Older versions of pydruid (< 0.6.0) experienced compatibility issues with Python 3.8.2 and newer due to changes in how `collections.abc` was imported, leading to `AttributeError: module 'collections' has no attribute 'abc'`.
- deprecated The `export_pandas()` and `export_tsv()` methods directly on `BaseDruidClient` (e.g., `PyDruid` instances) are deprecated. These methods now reside on the `Query` object returned by query methods.
- gotcha When using `pydruid`'s SQLAlchemy dialect, `sqlalchemy` is an optional dependency. If you install `pydruid` without the `[sqlalchemy]` extra (e.g., `pip install pydruid`), the SQLAlchemy integration will not work due to missing dependencies.
Install
-
pip install pydruid -
pip install pydruid[async,pandas,sqlalchemy,cli]
Imports
- PyDruid
from pydruid.client import PyDruid
- connect
from pydruid.db import connect
- doublesum
from pydruid.utils.aggregators import doublesum
- Filter
from pydruid.utils.filters import Filter
Quickstart
import os
from pydruid.db import connect
# Configure connection details (replace with your Druid Broker/Router info)
DRUID_HOST = os.environ.get('DRUID_HOST', 'localhost')
DRUID_PORT = int(os.environ.get('DRUID_PORT', '8082'))
DRUID_PATH = os.environ.get('DRUID_PATH', '/druid/v2/sql/')
DRUID_SCHEME = os.environ.get('DRUID_SCHEME', 'http')
try:
conn = connect(
host=DRUID_HOST,
port=DRUID_PORT,
path=DRUID_PATH,
scheme=DRUID_SCHEME
)
curs = conn.cursor()
curs.execute("SELECT COUNT(*) FROM wikipedia")
result = curs.fetchone()
print(f"Query result: {result}")
except Exception as e:
print(f"Error connecting to Druid or executing query: {e}")