{"id":1839,"library":"hdfs","title":"HdfsCLI: API and Command Line Interface for HDFS","description":"HdfsCLI provides a Python API and command-line interface for interacting with Hadoop HDFS via the WebHDFS (and HttpFS) API. It supports both secure and insecure clusters, offering Python 3 bindings for common HDFS operations. The library includes optional extensions for handling Avro files, Pandas DataFrames, and Kerberos authentication. The current version, 2.7.3, was released on October 12, 2023, indicating active maintenance.","status":"active","version":"2.7.3","language":"en","source_language":"en","source_url":"https://github.com/mtth/hdfs","tags":["hdfs","hadoop","filesystem","client","webhdfs","httpfs","bigdata"],"install":[{"cmd":"pip install hdfs","lang":"bash","label":"Core library"},{"cmd":"pip install hdfs[avro,dataframe,kerberos]","lang":"bash","label":"With optional extensions"}],"dependencies":[{"reason":"Required for the 'avro' extension to read and write Avro files.","package":"fastavro","optional":true},{"reason":"Required for the 'dataframe' extension to load and save Pandas DataFrames.","package":"pandas","optional":true},{"reason":"Required for the 'kerberos' extension to enable Kerberos authenticated clusters.","package":"requests-kerberos","optional":true}],"imports":[{"note":"The default and simplest client for insecure HDFS clusters.","symbol":"InsecureClient","correct":"from hdfs.client import InsecureClient"},{"note":"The base client class, InsecureClient is a subclass. Often used via `Client.from_alias()`.","symbol":"Client","correct":"from hdfs.client import Client"},{"note":"Used for token-based authentication with HDFS.","symbol":"TokenClient","correct":"from hdfs.client import TokenClient"},{"note":"Used for Kerberos authenticated clusters, requires the 'kerberos' extension.","symbol":"KerberosClient","correct":"from hdfs.ext.kerberos import KerberosClient"}],"quickstart":{"code":"import os\nfrom hdfs.client import InsecureClient\n\nHDFS_NAMENODE_URL = os.environ.get('HDFS_NAMENODE_URL', 'http://localhost:50070')\nHDFS_USER = os.environ.get('HDFS_USER', 'guest') # Or a specific HDFS user\n\ntry:\n    client = InsecureClient(HDFS_NAMENODE_URL, user=HDFS_USER)\n    print(f\"Connected to HDFS at {HDFS_NAMENODE_URL} as user {HDFS_USER}\")\n\n    # Example: Create a file\n    hdfs_path = '/user/temp/my_test_file.txt'\n    local_data = b'Hello, HdfsCLI world!'\n    with client.write(hdfs_path, encoding='utf-8', overwrite=True) as writer:\n        writer.write(local_data.decode('utf-8'))\n    print(f\"Successfully wrote to {hdfs_path}\")\n\n    # Example: List contents of a directory\n    parent_dir = os.path.dirname(hdfs_path)\n    if parent_dir == '': parent_dir = '/' # handle root edge case\n    print(f\"Contents of {parent_dir}:\")\n    for item in client.list(parent_dir):\n        print(f\"- {item}\")\n    \n    # Example: Read the file back\n    with client.read(hdfs_path, encoding='utf-8') as reader:\n        read_data = reader.read()\n    print(f\"Read from {hdfs_path}: {read_data}\")\n\n    # Example: Delete the file\n    client.delete(hdfs_path)\n    print(f\"Successfully deleted {hdfs_path}\")\n\nexcept Exception as e:\n    print(f\"An error occurred: {e}\")\n    print(\"Please ensure HDFS is running and HDFS_NAMENODE_URL/HDFS_USER are correctly configured.\")","lang":"python","description":"This quickstart demonstrates how to establish a connection to an HDFS Namenode using `InsecureClient`, write a simple file, list directory contents, read the file back, and then delete it. It uses environment variables for the Namenode URL and user for flexibility. Ensure your HDFS cluster is running and accessible at the specified URL."},"warnings":[{"fix":"Upgrade your Python environment to 3.7 or newer. If you must use Python 2, you'll need to use an older version of the `hdfs` library (e.g., `hdfs<2.0.0`), but this is not recommended due to lack of maintenance.","message":"HdfsCLI version 2.x and above has dropped official support for Python 2.x. It is compatible with Python 3.7+.","severity":"breaking","affected_versions":"<2.0.0"},{"fix":"When calling `client.write()`, include `overwrite=True` in the arguments if you intend to replace an existing file (e.g., `client.write(path, overwrite=True)`).","message":"By default, `client.write()` will raise an `HdfsError` if trying to write to an existing path. To overwrite an existing file, you must explicitly set `overwrite=True`.","severity":"gotcha","affected_versions":"All"},{"fix":"To delete a directory and its contents, use `client.delete(path, recursive=True)`. Consider `skip_trash=False` (requires Hadoop 2.9+) if you want files to go to trash instead of being permanently deleted.","message":"Deleting a non-empty directory without `recursive=True` will raise an `HdfsError`. This is a safety mechanism.","severity":"gotcha","affected_versions":"All"},{"fix":"Ensure you have a `~/.hdfscli.cfg` file (or `HDFSCLI_CONFIG` environment variable pointing to one) with valid alias definitions, including `url` and optional `user` or `client` (e.g., `KerberosClient`).","message":"Using `Client.from_alias()` relies on a configuration file (default: `~/.hdfscli.cfg`) which defines cluster connection details. Without proper configuration, this method will fail.","severity":"gotcha","affected_versions":"All"},{"fix":"Install the kerberos extension (`pip install hdfs[kerberos]`). Ensure your `krb5.conf` is correctly configured and you have a valid Kerberos ticket. Refer to the HdfsCLI documentation for detailed Kerberos setup instructions.","message":"The `KerberosClient` requires the `hdfs[kerberos]` extra to be installed and proper Kerberos configuration on the client machine and HDFS cluster. Misconfiguration often leads to authentication errors.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}