Azure Data Lake Store Filesystem Client Library

1.0.1 · active · verified Sun Mar 29

The `azure-datalake-store` library provides a pure-Python interface for Azure Data Lake Storage Gen 1, offering Pythonic file-system and file objects with capabilities for high-performance uploading and downloading. It is currently at version 1.0.1, having recently transitioned from a series of `0.0.x` pre-releases to a `1.0.x` stable branch. The project is under active development, but the official documentation notes it is 'not yet recommended for general use'. This library specifically supports ADLS Gen 1; for ADLS Gen 2, users should refer to `azure-storage-file-datalake`.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to authenticate and perform basic file operations (list, create directory, create file, write, read, delete) with Azure Data Lake Store Gen 1 using `AzureDLFileSystem`. It relies on environment variables (`AZURE_TENANT_ID`, `AZURE_USERNAME`, `AZURE_PASSWORD`, `AZURE_STORE_NAME`) for authentication, which is a common and recommended approach for service principals.

import os
from azure.datalake.store import core

# Set these environment variables for authentication
# Ensure AZURE_TENANT_ID, AZURE_USERNAME, AZURE_PASSWORD, AZURE_STORE_NAME are set
# For testing, use placeholder values if not connecting to a real ADLS Gen1

tenant_id = os.environ.get('AZURE_TENANT_ID', 'YOUR_TENANT_ID')
username = os.environ.get('AZURE_USERNAME', 'YOUR_USERNAME')
password = os.environ.get('AZURE_PASSWORD', 'YOUR_PASSWORD')
store_name = os.environ.get('AZURE_STORE_NAME', 'youradlstorename')

try:
    # Authenticate (lib.auth now uses generic Azure token credentials internally)
    token = core.lib.auth(tenant_id, username, password)
    
    # Initialize the Data Lake Store filesystem client
    adl = core.AzureDLFileSystem(store_name, token=token)
    
    # Example: List contents of the root directory
    print(f"Listing contents of / in {store_name}:")
    items = adl.ls('/', detail=True)
    if items:
        for item in items:
            print(item)
    else:
        print("Directory is empty or path does not exist.")

    # Example: Create a directory and a file
    test_dir = 'mytestdir'
    test_file = f'{test_dir}/testfile.txt'
    if not adl.exists(test_dir):
        adl.mkdir(test_dir)
        print(f"Created directory: {test_dir}")

    with adl.open(test_file, 'wb') as f:
        f.write(b"Hello from Azure Data Lake Store Gen1!")
    print(f"Created and wrote to file: {test_file}")

    # Example: Read the file
    with adl.open(test_file, 'rb') as f:
        content = f.read()
        print(f"Content of {test_file}: {content.decode('utf-8')}")

    # Example: Delete the file and directory
    adl.rm(test_file)
    print(f"Deleted file: {test_file}")
    adl.rmdir(test_dir)
    print(f"Deleted directory: {test_dir}")

except Exception as e:
    print(f"An error occurred: {e}")
    print("Please ensure environment variables for ADLS Gen1 authentication (AZURE_TENANT_ID, AZURE_USERNAME, AZURE_PASSWORD, AZURE_STORE_NAME) are correctly set, or replace placeholders.")

view raw JSON →