Impyla
Impyla is a Python client for HiveServer2 implementations, such as the Impala distributed query engine. It provides a Python DB API 2.0 (PEP 249)-compliant interface, enabling Python applications to connect to Impala and execute SQL queries. The library, currently at version 0.22.0, is actively maintained by Cloudera with a regular, though somewhat irregular, release cadence, often including alpha and stable versions within a year.
Warnings
- breaking In version 0.20.0, the behavior of `Cursor.rowcount` and the automatic closing of finished queries (`close_finished_queries`) changed. This might be a breaking change for existing applications.
- gotcha When connecting to Impala via Impyla, ensure you use the HiveServer2 port (default 21050). Using the Beeswax port (default 21000), which the Impala shell typically uses, will result in connection errors.
- gotcha Impyla has been reported to break with newer versions of the `bitarray` library (e.g., version 2.1.0). Compatibility issues may arise if `bitarray` is updated beyond tested versions.
- gotcha Python 3.12 support is incomplete in Impyla versions up to 0.22.0. A known issue is that installation using `setuptools` may fail with Python 3.12.
- deprecated The `auth_cookie_names` parameter in the `connect()` API was deprecated in version 0.18.0. Authentication cookie functionality is now enabled by default.
Install
-
pip install impyla
Imports
- connect
from impala.dbapi import connect
Quickstart
import os
from impala.dbapi import connect
IMPALA_HOST = os.environ.get('IMPYLA_TEST_HOST', 'localhost')
IMPALA_PORT = int(os.environ.get('IMPYLA_TEST_PORT', 21050))
try:
conn = connect(host=IMPALA_HOST, port=IMPALA_PORT)
cursor = conn.cursor()
cursor.execute('SHOW TABLES')
tables = cursor.fetchall()
print('Tables in Impala:')
for table in tables:
print(table[0])
cursor.close()
conn.close()
except Exception as e:
print(f"Could not connect or query Impala: {e}")
print("Please ensure an Impala daemon is running at the specified host and port.")