SQLAlchemy Apache Drill Dialect
sqlalchemy-drill is an Apache Drill dialect for SQLAlchemy, enabling Python applications to connect and interact with Apache Drill databases using SQLAlchemy's ORM or Core. It supports connections via REST API, JDBC, and ODBC. The library is actively maintained, with version 1.1.10 being the current release, and sees regular bug fix updates. Its primary purpose includes integration with tools like Apache Superset.
Warnings
- breaking When connecting to Apache Drill versions prior to 1.19 via the REST API (drill+sadrill), all data values are serialized to JSON strings. This means that numerical or temporal data will be returned as strings and require manual typecasting in your application.
- gotcha JDBC connections require manual setup of JPype and the Drill JDBC driver. The `sqlalchemy-drill` library does not start the JVM or manage JDBC driver paths for you. Users must explicitly call `jpype.startJVM()` and set environment variables like `DRILL_JDBC_DRIVER_PATH` and `DRILL_JDBC_JAR_NAME`.
- gotcha Optional dependencies (`JayDeBeApi`, `jPype1`, `pyodbc`) for JDBC and ODBC connectivity are NOT installed by default with `pip install sqlalchemy-drill`. Attempting to use JDBC or ODBC dialects without these packages will result in connection errors.
- gotcha Versions prior to `1.1.2` had issues with impersonation for Drill 1.20, potentially leading to incorrect user context for queries.
- gotcha Early versions (e.g., prior to `1.1.7` and `1.18`) contained bugs in DB-API compliance regarding method delegation and return values, which could lead to unexpected behavior during query execution or result fetching.
Install
-
pip install sqlalchemy-drill
Imports
- create_engine
from sqlalchemy import create_engine
- text
from sqlalchemy import text
Quickstart
from sqlalchemy import create_engine, text
import os
# For REST API connection
# Adjust host and port as per your Drill setup
# Use environment variables for sensitive data in production
# Example for local Drill embedded mode (drill+sadrill)
DR_HOST = os.environ.get('DRILL_HOST', 'localhost')
DR_PORT = os.environ.get('DRILL_REST_PORT', '8047')
DR_PLUGIN = os.environ.get('DRILL_STORAGE_PLUGIN', 'dfs')
DR_USE_SSL = os.environ.get('DRILL_USE_SSL', 'False').lower() == 'true'
connection_string = f"drill+sadrill://{DR_HOST}:{DR_PORT}/{DR_PLUGIN}?use_ssl={DR_USE_SSL}"
try:
engine = create_engine(connection_string)
with engine.connect() as connection:
result = connection.execute(text("SELECT * FROM INFORMATION_SCHEMA.CATALOGS LIMIT 5"))
print("Successfully connected to Apache Drill via REST API.")
print("First 5 catalogs:")
for row in result:
print(row)
except Exception as e:
print(f"Failed to connect to Apache Drill: {e}")
print("Ensure Apache Drill is running and accessible at the specified host/port.")
print("For JDBC or ODBC, ensure required optional dependencies and drivers are installed and configured.")