Pure Transport for PyHive
pure-transport is a Python library providing a SASL-based Thrift transport layer specifically designed for PyHive. It aims to offer better compatibility with newer versions of the `thrift` library (0.11.0+) than the `thrift_sasl` library, addressing common issues encountered when connecting to Hive or Impala servers requiring SASL authentication. The current version is 0.2.0, and it is released on an as-needed basis to support PyHive ecosystems.
Common errors
-
ImportError: cannot import name 'TSaslClientTransport' from 'thrift_sasl'
cause This error often occurs when `thrift_sasl` is incompatible with the installed `thrift` version (0.11.0 or newer).fixUse `from pure_transport.sasl_transport import TSaslClientTransport` instead of `from thrift_sasl import TSaslClientTransport`. Also ensure `pure-transport` is installed via `pip install pure-transport`. -
TTransportException: Could not connect to ...
cause The underlying `TSocket` failed to establish a connection to the specified host/port, or the connection was immediately rejected by the server, often due to authentication failure or the server not running.fixCheck network connectivity to the Hive/Impala server (host, port). Verify the server is running. Ensure SASL authentication parameters (mechanism, username, password, service_name) are correct and match the target server's configuration. -
AttributeError: module 'pure_transport' has no attribute 'TSaslClientTransport'
cause Attempting to import `TSaslClientTransport` directly from the top-level `pure_transport` module instead of its specific submodule.fixCorrect the import path to `from pure_transport.sasl_transport import TSaslClientTransport`.
Warnings
- breaking Using `thrift_sasl.TSaslClientTransport` with `thrift` library versions 0.11.0 or higher can lead to connection errors or unexpected behavior due to API changes in `thrift`. `pure-transport` was developed specifically to address this incompatibility.
- gotcha Connection failures due to misconfigured SASL mechanism (e.g., `PLAIN`, `GSSAPI`) or incorrect authentication parameters (username, password, service principal name).
- gotcha Although `pure-transport` replaces `thrift_sasl.TSaslClientTransport` for compatibility, it still depends on the `thrift_sasl` library for its underlying SASL negotiation logic. If `thrift_sasl` is not installed, `pure-transport` will fail to initialize or perform SASL negotiation.
Install
-
pip install pure-transport
Imports
- TSaslClientTransport
from thrift_sasl import TSaslClientTransport
from pure_transport.sasl_transport import TSaslClientTransport
Quickstart
import os
from thrift.transport.TSocket import TSocket
from thrift.transport.TTransport import TBufferedTransport
from thrift.protocol import TBinaryProtocol
from pure_transport.sasl_transport import TSaslClientTransport
# 1. Define connection details (replace with your actual server info)
# Using os.environ.get for runnable example, set environment variables or replace directly
HIVE_HOST = os.environ.get("HIVE_HOST", "localhost")
HIVE_PORT = int(os.environ.get("HIVE_PORT", "10000"))
SASL_MECHANISM = os.environ.get("HIVE_SASL_MECHANISM", "PLAIN") # e.g., PLAIN, GSSAPI
SASL_USERNAME = os.environ.get("HIVE_SASL_USERNAME", "testuser")
SASL_PASSWORD = os.environ.get("HIVE_SASL_PASSWORD", "testpass") # Only for PLAIN
auth_params = {
"mechanism": SASL_MECHANISM,
"username": SASL_USERNAME,
}
if SASL_MECHANISM == "PLAIN":
auth_params["password"] = SASL_PASSWORD
elif SASL_MECHANISM == "GSSAPI":
auth_params["service_name"] = os.environ.get("HIVE_SERVICE_NAME", "hive")
# Add other GSSAPI parameters if necessary, e.g., principal, keytab
print(f"Attempting to initialize transport for {HIVE_HOST}:{HIVE_PORT} with SASL {SASL_MECHANISM}...")
try:
# 2. Create a TSocket (from thrift)
# In a real PyHive/Impala-dbapi scenario, this is often handled internally.
socket = TSocket(HIVE_HOST, HIVE_PORT)
# 3. Create the pure_transport SASL transport, wrapping the TSocket
# This is the core component provided by pure-transport
transport = TSaslClientTransport(
socket,
HIVE_HOST, # Host for SASL negotiation (e.g., Kerberos principal resolution)
**auth_params
)
# 4. Wrap with a buffered transport (optional but common for performance)
buffered_transport = TBufferedTransport(transport)
# 5. Create a Thrift protocol
protocol = TBinaryProtocol.TBinaryProtocol(buffered_transport)
print(f"\nSuccessfully initialized pure_transport TSaslClientTransport for {HIVE_HOST}:{HIVE_PORT}")
print(f"SASL Mechanism: {SASL_MECHANISM}")
print(f"Transport object type: {type(transport).__name__}")
print("This transport and protocol can now be used with a Thrift client (e.g., PyHive).")
# In a full application, you would then open the transport and use a Thrift client:
# buffered_transport.open()
# client = MyThriftClient(protocol)
# result = client.some_method()
# buffered_transport.close()
except Exception as e:
print(f"\nFailed to initialize transport: {e}")
print("Ensure `pure-transport`, `thrift`, and `thrift_sasl` are installed.")
print("For actual connection, verify network and server availability.")