{"id":2683,"library":"pydruid","title":"PyDruid","description":"PyDruid is a Python connector for Apache Druid, providing a simple API to create, execute, and analyze Druid queries. It can parse query results into Pandas DataFrame objects for seamless integration with the SciPy stack. The library offers both synchronous and asynchronous clients, implements the Python DB API 2.0, and includes a SQLAlchemy dialect. The current version is 0.6.9, and it maintains an active development and release cadence.","status":"active","version":"0.6.9","language":"en","source_language":"en","source_url":"https://github.com/druid-io/pydruid","tags":["database","druid","connector","sql","olap","data-analytics"],"install":[{"cmd":"pip install pydruid","lang":"bash","label":"Basic Installation"},{"cmd":"pip install pydruid[async,pandas,sqlalchemy,cli]","lang":"bash","label":"Installation with all optional features"}],"dependencies":[{"reason":"Required for exporting query results to Pandas DataFrames.","package":"pandas","optional":true},{"reason":"Required for the asynchronous client (pydruid.async_client).","package":"tornado","optional":true},{"reason":"Required for using the SQLAlchemy dialect.","package":"sqlalchemy","optional":true},{"reason":"Required for the command line interface (CLI).","package":"prompt_toolkit","optional":true}],"imports":[{"symbol":"PyDruid","correct":"from pydruid.client import PyDruid"},{"symbol":"connect","correct":"from pydruid.db import connect"},{"symbol":"doublesum","correct":"from pydruid.utils.aggregators import doublesum"},{"symbol":"Filter","correct":"from pydruid.utils.filters import Filter"}],"quickstart":{"code":"import os\nfrom pydruid.db import connect\n\n# Configure connection details (replace with your Druid Broker/Router info)\nDRUID_HOST = os.environ.get('DRUID_HOST', 'localhost')\nDRUID_PORT = int(os.environ.get('DRUID_PORT', '8082'))\nDRUID_PATH = os.environ.get('DRUID_PATH', '/druid/v2/sql/')\nDRUID_SCHEME = os.environ.get('DRUID_SCHEME', 'http')\n\ntry:\n    conn = connect(\n        host=DRUID_HOST, \n        port=DRUID_PORT, \n        path=DRUID_PATH, \n        scheme=DRUID_SCHEME\n    )\n    curs = conn.cursor()\n    curs.execute(\"SELECT COUNT(*) FROM wikipedia\")\n    result = curs.fetchone()\n    print(f\"Query result: {result}\")\nexcept Exception as e:\n    print(f\"Error connecting to Druid or executing query: {e}\")","lang":"python","description":"This quickstart demonstrates how to establish a connection to Druid using the Python DB API 2.0 interface and execute a basic SQL query. Ensure your Druid cluster is running and accessible at the specified host and port. Connection parameters can be provided via environment variables DRUID_HOST, DRUID_PORT, DRUID_PATH, and DRUID_SCHEME."},"warnings":[{"fix":"Pass `context={'sqlQueryId': '...', 'header': True}` in your query parameters, or ensure your Druid configuration forces header output if applicable for all queries.","message":"For Druid SQL versions >= 0.13.0, to ensure correct column name inference, especially with empty result sets, you should set `header=true` in the query context. PyDruid defaults to `header=false` for backward compatibility, which can lead to issues inferring column names.","severity":"breaking","affected_versions":"<= 0.6.x (when interacting with Druid >= 0.13.0)"},{"fix":"Upgrade pydruid to version 0.6.0 or newer to ensure compatibility with modern Python versions.","message":"Older versions of pydruid (< 0.6.0) experienced compatibility issues with Python 3.8.2 and newer due to changes in how `collections.abc` was imported, leading to `AttributeError: module 'collections' has no attribute 'abc'`.","severity":"breaking","affected_versions":"< 0.6.0 with Python >= 3.8.2"},{"fix":"After executing a query (e.g., `query_result = client.timeseries(...)`), call `query_result.export_pandas()` or `query_result.export_tsv(dest_path)` instead.","message":"The `export_pandas()` and `export_tsv()` methods directly on `BaseDruidClient` (e.g., `PyDruid` instances) are deprecated. These methods now reside on the `Query` object returned by query methods.","severity":"deprecated","affected_versions":"All versions where `Query` object exists"},{"fix":"Install `pydruid` with the `[sqlalchemy]` extra: `pip install pydruid[sqlalchemy]`.","message":"When using `pydruid`'s SQLAlchemy dialect, `sqlalchemy` is an optional dependency. If you install `pydruid` without the `[sqlalchemy]` extra (e.g., `pip install pydruid`), the SQLAlchemy integration will not work due to missing dependencies.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-10T00:00:00.000Z","next_check":"2026-07-09T00:00:00.000Z"}