Snakebite-py3: Pure Python HDFS Client

3.0.6 · active · verified Mon Apr 13

snakebite-py3 is a Python library that provides a pure Python client for the Hadoop Distributed File System (HDFS). It communicates directly with the HDFS NameNode using protobuf messages and implements the Hadoop RPC protocol, offering a native Python alternative to calling Java-based `hadoop fs` commands. This fork, maintained by the Internet Archive, specifically targets Python 3 compatibility. The current version is 3.0.6, released in February 2025.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to establish a connection to an HDFS NameNode and perform basic file system operations like listing directories and creating a new directory. It uses environment variables for host and port for flexibility, defaulting to `localhost:8020`. Ensure your HDFS cluster is running and accessible from where you execute this code.

import os
from snakebite.client import Client

# Configure HDFS NameNode host and port
# Default HDFS RPC port is 8020
host = os.environ.get('HDFS_NAMENODE_HOST', 'localhost')
port = int(os.environ.get('HDFS_NAMENODE_PORT', '8020'))

try:
    # Initialize the HDFS client
    # It's recommended to set use_trash=False for non-interactive scripts
    # Or explicitly set hadoop_version if not the default (9)
    client = Client(host, port, use_trash=False)

    print(f"Connected to HDFS NameNode at {host}:{port}")

    # Example: List contents of the root directory
    print("Listing /:")
    for item in client.ls(['/']):
        print(item)

    # Example: Create a directory
    test_dir = '/user/test_snakebite_py3'
    if not list(client.ls([test_dir])):
        print(f"Creating directory {test_dir}")
        list(client.mkdir([test_dir], create_parents=True))
    else:
        print(f"Directory {test_dir} already exists.")

except Exception as e:
    print(f"An error occurred: {e}")
    print("Please ensure your HDFS NameNode is running and accessible at the specified host and port.")

view raw JSON →