{"id":9219,"library":"pyhs2","title":"Python Hive Server 2 Client Driver","description":"pyHS2 is a Python client driver designed for connecting to Hive Server 2. The project's last stable release was 0.6.0 in November 2014. It is no longer actively maintained, with the developer ceasing support in early 2016 and recommending alternative libraries.","status":"abandoned","version":"0.6.0","language":"en","source_language":"en","source_url":"https://github.com/BradRuderman/pyhs2","tags":["hive","hiveserver2","bigdata","database","python2","deprecated"],"install":[{"cmd":"pip install pyhs2","lang":"bash","label":"Install pyhs2"},{"cmd":"sudo yum install cyrus-sasl-devel # For Red Hat/CentOS/Fedora\nsudo apt-get update && sudo apt-get install libsasl2-dev # For Debian/Ubuntu","lang":"bash","label":"Install SASL development headers (prerequisite for 'sasl' dependency)"}],"dependencies":[{"reason":"Required for SASL authentication mechanisms. This dependency needs system-level SASL development headers for compilation.","package":"sasl","optional":false},{"reason":"Underpins the communication protocol with Hive Server 2.","package":"thrift","optional":false}],"imports":[{"note":"The primary connection function is directly available under the top-level 'pyhs2' module.","wrong":"from pyhs2.connections import connect","symbol":"connect","correct":"import pyhs2\nconn = pyhs2.connect(...)"},{"symbol":"Pyhs2Exception","correct":"from pyhs2.error import Pyhs2Exception"}],"quickstart":{"code":"import os\nimport pyhs2\n\nhive_host = os.environ.get('HIVE_HOST', 'localhost')\nhive_port = int(os.environ.get('HIVE_PORT', '10000'))\nhive_user = os.environ.get('HIVE_USER', 'hive')\nhive_password = os.environ.get('HIVE_PASSWORD', '')\nhive_database = os.environ.get('HIVE_DATABASE', 'default')\n\ntry:\n    with pyhs2.connect(\n        host=hive_host,\n        port=hive_port,\n        authMechanism=\"PLAIN\", # or \"KERBEROS\" or None\n        user=hive_user,\n        password=hive_password,\n        database=hive_database\n    ) as conn:\n        print(\"Successfully connected to Hive Server 2.\")\n        with conn.cursor() as cur:\n            # Show databases\n            print(f\"Databases: {cur.getDatabases()}\")\n\n            # Execute a query\n            cur.execute(\"SELECT * FROM some_table LIMIT 5\")\n\n            # Return column info\n            print(f\"Schema: {cur.getSchema()}\")\n\n            # Fetch table results\n            print(\"Query Results:\")\n            for row in cur.fetch():\n                print(row)\n\nexcept pyhs2.error.Pyhs2Exception as e:\n    print(f\"pyhs2 error: {e}\")\nexcept Exception as e:\n    print(f\"An unexpected error occurred: {e}\")\n","lang":"python","description":"This quickstart demonstrates how to establish a connection to Hive Server 2 using `pyhs2`, execute a sample query, retrieve schema information, and fetch results. It uses environment variables for connection parameters for flexibility. Ensure your Hive Server 2 is running and accessible."},"warnings":[{"fix":"Use Python 2.x for `pyhs2` projects, or migrate to a maintained Python 3.x compatible alternative like `PyHive` or `impyla`.","message":"`pyhs2` does NOT support Python 3.x. It is designed for Python 2.x, and attempts to use it with Python 3.x will result in `ModuleNotFoundError` or other incompatibilities due to underlying dependencies like `sasl` and `cStringIO`.","severity":"breaking","affected_versions":"All versions (0.1.0 - 0.6.0) when used with Python 3.x"},{"fix":"Consider migrating to actively maintained libraries such as `PyHive` (from Dropbox) or `impyla` (from Cloudera) for connecting to Hive Server 2.","message":"The `pyhs2` library is no longer maintained. The last release was in 2014, and the developer officially stated that maintenance ceased in January 2016, recommending alternatives.","severity":"deprecated","affected_versions":"All versions (0.1.0 - 0.6.0)"},{"fix":"Before `pip install pyhs2`, install the necessary system packages: `sudo yum install cyrus-sasl-devel` on Red Hat/CentOS/Fedora systems, or `sudo apt-get update && sudo apt-get install libsasl2-dev` on Debian/Ubuntu systems.","message":"Installing `pyhs2` often fails due to missing system-level development headers for `cyrus-sasl`. The `sasl` Python package, a dependency, requires these headers to compile.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Verify the Hive Server 2 hostname/IP, port (commonly 10000), and the correct authentication mechanism (e.g., `PLAIN` for simple username/password, or `KERBEROS` if configured). Check `hive-site.xml` for correct HiveServer2 configuration.","message":"Connection issues (`TTransport.TTransportException: Could not connect`) are frequently caused by incorrect host, port, authentication mechanism, user, or password. Hive Server 2 typically runs on port 10000.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Instead of `for i in cur.fetch():`, use a loop with `while cur.hasMoreRows: print cur.fetchone()`. Consider adding `LIMIT` clauses to queries during development.","message":"Fetching large result sets can appear to hang or be inefficient. The `fetchone()` and `hasMoreRows` pattern is recommended over simply iterating `cur.fetch()` for better control.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Use Python 2.7 for your project, or switch to a Python 3-compatible Hive client like `PyHive` or `impyla`.","cause":"This error typically occurs when running `pyhs2` with Python 3.x. `pyhs2` is not Python 3 compatible.","error":"ModuleNotFoundError: No module named 'cloudera'"},{"fix":"Verify that Hive Server 2 is running and listening on the specified host and port (e.g., `localhost:10000`). Check firewall rules, network connectivity, and the `hive-site.xml` configuration for the correct HiveServer2 endpoint.","cause":"The Python client could not establish a network connection to the specified Hive Server 2 host and port. This could be due to the server not running, incorrect IP/hostname, firewall issues, or an incorrect port.","error":"thrift.transport.TTransport.TTransportException: Could not connect to localhost:10000"},{"fix":"Install the appropriate system-level SASL development packages: For Red Hat/CentOS/Fedora, run `sudo yum install cyrus-sasl-devel`. For Debian/Ubuntu, run `sudo apt-get update && sudo apt-get install libsasl2-dev`. Then retry `pip install pyhs2`.","cause":"During `pip install pyhs2`, the `sasl` dependency fails to compile because the system is missing the development headers for the Cyrus SASL library.","error":"fatal error: sasl/sasl.h: No such file or directory"},{"fix":"First, try running the exact same Hive query directly in the Hive shell to isolate if it's a `pyhs2` or Hive/query issue. If the query works in Hive shell, check the user permissions in the `pyhs2.connect()` call. Ensure the user has appropriate read/write access to tables and databases in Hive.","cause":"This error often indicates a problem with the Hive query itself, permissions for the user executing the query, or issues with the underlying Hadoop/MapReduce job that Hive attempts to run.","error":"pyhs2.error.Pyhs2Exception: 'Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask'"}]}