Pure Transport for PyHive

0.2.0 · active · verified Fri Apr 17

pure-transport is a Python library providing a SASL-based Thrift transport layer specifically designed for PyHive. It aims to offer better compatibility with newer versions of the `thrift` library (0.11.0+) than the `thrift_sasl` library, addressing common issues encountered when connecting to Hive or Impala servers requiring SASL authentication. The current version is 0.2.0, and it is released on an as-needed basis to support PyHive ecosystems.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize `pure_transport.sasl_transport.TSaslClientTransport` using `thrift` components. This transport object can then be passed to higher-level clients like PyHive or Impala-dbapi. The example uses environment variables for configuration for easy testing; replace them with your actual connection details.

import os
from thrift.transport.TSocket import TSocket
from thrift.transport.TTransport import TBufferedTransport
from thrift.protocol import TBinaryProtocol
from pure_transport.sasl_transport import TSaslClientTransport

# 1. Define connection details (replace with your actual server info)
# Using os.environ.get for runnable example, set environment variables or replace directly
HIVE_HOST = os.environ.get("HIVE_HOST", "localhost")
HIVE_PORT = int(os.environ.get("HIVE_PORT", "10000"))
SASL_MECHANISM = os.environ.get("HIVE_SASL_MECHANISM", "PLAIN") # e.g., PLAIN, GSSAPI
SASL_USERNAME = os.environ.get("HIVE_SASL_USERNAME", "testuser")
SASL_PASSWORD = os.environ.get("HIVE_SASL_PASSWORD", "testpass") # Only for PLAIN

auth_params = {
    "mechanism": SASL_MECHANISM,
    "username": SASL_USERNAME,
}
if SASL_MECHANISM == "PLAIN":
    auth_params["password"] = SASL_PASSWORD
elif SASL_MECHANISM == "GSSAPI":
    auth_params["service_name"] = os.environ.get("HIVE_SERVICE_NAME", "hive")
    # Add other GSSAPI parameters if necessary, e.g., principal, keytab

print(f"Attempting to initialize transport for {HIVE_HOST}:{HIVE_PORT} with SASL {SASL_MECHANISM}...")
try:
    # 2. Create a TSocket (from thrift)
    # In a real PyHive/Impala-dbapi scenario, this is often handled internally.
    socket = TSocket(HIVE_HOST, HIVE_PORT)

    # 3. Create the pure_transport SASL transport, wrapping the TSocket
    # This is the core component provided by pure-transport
    transport = TSaslClientTransport(
        socket,
        HIVE_HOST, # Host for SASL negotiation (e.g., Kerberos principal resolution)
        **auth_params
    )

    # 4. Wrap with a buffered transport (optional but common for performance)
    buffered_transport = TBufferedTransport(transport)

    # 5. Create a Thrift protocol
    protocol = TBinaryProtocol.TBinaryProtocol(buffered_transport)

    print(f"\nSuccessfully initialized pure_transport TSaslClientTransport for {HIVE_HOST}:{HIVE_PORT}")
    print(f"SASL Mechanism: {SASL_MECHANISM}")
    print(f"Transport object type: {type(transport).__name__}")
    print("This transport and protocol can now be used with a Thrift client (e.g., PyHive).")

    # In a full application, you would then open the transport and use a Thrift client:
    # buffered_transport.open()
    # client = MyThriftClient(protocol)
    # result = client.some_method()
    # buffered_transport.close()

except Exception as e:
    print(f"\nFailed to initialize transport: {e}")
    print("Ensure `pure-transport`, `thrift`, and `thrift_sasl` are installed.")
    print("For actual connection, verify network and server availability.")

view raw JSON →