Hive Metastore Client
The `hive-metastore-client` library provides a Pythonic interface for connecting to and performing Data Definition Language (DDL) operations on a Hive Metastore using the Thrift protocol. It simplifies interactions with Hive metadata, enabling users to programmatically create and manage databases, tables, and partitions. Actively maintained by QuintoAndar, the library is currently at version 1.0.9, offering a high-level abstraction over the underlying Thrift APIs.
Common errors
-
TTransportException: Could not connect to <HIVE_HOST>:<HIVE_PORT>
cause The Python client failed to establish a network connection to the specified Hive Metastore host and port. This is often due to the Metastore service not running, incorrect host/port, or network/firewall issues.fixVerify that the Hive Metastore service is running. Check that the `HIVE_METASTORE_HOST` and `HIVE_METASTORE_PORT` (or equivalent) configured in your client code precisely match the running Metastore service. Use network tools (like `ping`, `telnet`, or `nc`) to confirm network reachability from the client machine to the Metastore server on the specified port. -
Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
cause This error, typically seen in the Hive Metastore server logs or indirectly reported, indicates a fundamental issue within the Metastore's ability to initialize its internal client, often due to configuration problems, database connectivity, or resource exhaustion (e.g., too many connections to its backend database).fixThis usually points to an issue on the Hive Metastore server side, not directly the Python client. Check the Hive Metastore server logs for more detailed errors. Common fixes include verifying the `hive-site.xml` configuration (especially `hive.metastore.uris` and database connection details), ensuring the Metastore's backend database is accessible and not overloaded, and restarting the Metastore service.
Warnings
- gotcha The client relies on the Thrift protocol. If you attempt to manually compile or regenerate Thrift files (an advanced use case), ensure you use a compatible `thrift` compiler version. The project documentation noted using Thrift 0.14.0 for internal generation. Incompatibilities can lead to cryptic errors if the generated Python code is not consistent with the Metastore server's Thrift definition.
- gotcha Direct connection failures often stem from incorrect Hive Metastore host or port configurations. The `HiveMetastoreClient` requires precise network location of the Metastore service.
- gotcha Compatibility with the underlying Hive Metastore server version is crucial. While `hive-metastore-client` aims to abstract complexities, major version differences in Hive (e.g., Hive 2.x vs 3.x vs 4.x) can introduce breaking changes in the Metastore's Thrift API. Such changes might affect functionality or lead to unexpected behavior if the client's internal Thrift definitions do not align with the server.
Install
-
pip install hive-metastore-client
Imports
- HiveMetastoreClient
from hive_metastore_client import HiveMetastoreClient
- DatabaseBuilder
from hive_metastore_client.builders import DatabaseBuilder
- TableBuilder
from hive_metastore_client.builders import TableBuilder
Quickstart
import os
from hive_metastore_client import HiveMetastoreClient
from hive_metastore_client.builders import DatabaseBuilder, FieldSchemaBuilder, TableBuilder
from hive_metastore_client.thrift_files.libraries.thrift_hive_metastore_client.ttypes import Table, FieldSchema, StorageDescriptor, SerDeInfo
HIVE_HOST = os.environ.get('HIVE_METASTORE_HOST', 'localhost')
HIVE_PORT = int(os.environ.get('HIVE_METASTORE_PORT', '9083'))
try:
# 1. Create a database
db_name = "my_test_database"
database = DatabaseBuilder(name=db_name).build()
with HiveMetastoreClient(HIVE_HOST, HIVE_PORT) as hive_client:
hive_client.create_database(database, if_not_exists=True)
print(f"Database '{db_name}' created successfully (or already exists).")
# 2. Create a table in the new database
table_name = "my_test_table"
columns = [
FieldSchemaBuilder(name="id", type="int").build(),
FieldSchemaBuilder(name="name", type="string").build()
]
table = TableBuilder(name=table_name, database_name=db_name, columns=columns).build()
hive_client.create_table(table, if_not_exists=True)
print(f"Table '{db_name}.{table_name}' created successfully (or already exists).")
# Example: Get the created table
retrieved_table = hive_client.get_table(db_name, table_name)
print(f"Retrieved table: {retrieved_table.name} in database {retrieved_table.dbName}")
except Exception as e:
print(f"An error occurred: {e}")
print("Ensure HIVE_METASTORE_HOST and HIVE_METASTORE_PORT environment variables are set or defaults are correct.")
print("Also, verify that the Hive Metastore service is running and accessible.")