Azure Data Lake Store Filesystem Client Library

1.0.1 · active · verified Tue Apr 07

The `azure-datalake-store` library provides a pure-Python interface for Azure Data Lake Storage Gen 1, offering Pythonic file-system and file objects with capabilities for high-performance uploading and downloading. It is currently at version 1.0.1, having recently transitioned from a series of `0.0.x` pre-releases to a `1.0.x` stable branch. The project is under active development, but the official documentation notes it is 'not yet recommended for general use'. This library specifically supports ADLS Gen 1; for ADLS Gen 2, users should refer to `azure-storage-file-datalake`.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to authenticate and perform basic file operations (list, create directory, create file, write, read, delete) with Azure Data Lake Store Gen 1 using `AzureDLFileSystem`. It relies on environment variables (`AZURE_TENANT_ID`, `AZURE_USERNAME`, `AZURE_PASSWORD`, `AZURE_STORE_NAME`) for authentication, which is a common and recommended approach for service principals.

import os
from azure.datalake.store import core

# Set these environment variables for authentication
# Ensure AZURE_TENANT_ID, AZURE_USERNAME, AZURE_PASSWORD, AZURE_STORE_NAME are set
# For testing, use placeholder values if not connecting to a real ADLS Gen1

tenant_id = os.environ.get('AZURE_TENANT_ID', 'YOUR_TENANT_ID')
username = os.environ.get('AZURE_USERNAME', 'YOUR_USERNAME')
password = os.environ.get('AZURE_PASSWORD', 'YOUR_PASSWORD')
store_name = os.environ.get('AZURE_STORE_NAME', 'youradlstorename')

try:
    # Authenticate (lib.auth now uses generic Azure token credentials internally)
    token = core.lib.auth(tenant_id, username, password)
    
    # Initialize the Data Lake Store filesystem client
    adl = core.AzureDLFileSystem(store_name, token=token)
    
    # Example: List contents of the root directory
    print(f"Listing contents of / in {store_name}:")
    items = adl.ls('/', detail=True)
    if items:
        for item in items:
            print(item)
    else:
        print("Directory is empty or path does not exist.")

    # Example: Create a directory and a file
    test_dir = 'mytestdir'
    test_file = f'{test_dir}/testfile.txt'
    if not adl.exists(test_dir):
        adl.mkdir(test_dir)
        print(f"Created directory: {test_dir}")

    with adl.open(test_file, 'wb') as f:
        f.write(b"Hello from Azure Data Lake Store Gen1!")
    print(f"Created and wrote to file: {test_file}")

    # Example: Read the file
    with adl.open(test_file, 'rb') as f:
        content = f.read()
        print(f"Content of {test_file}: {content.decode('utf-8')}")

    # Example: Delete the file and directory
    adl.rm(test_file)
    print(f"Deleted file: {test_file}")
    adl.rmdir(test_dir)
    print(f"Deleted directory: {test_dir}")

except Exception as e:
    print(f"An error occurred: {e}")
    print("Please ensure environment variables for ADLS Gen1 authentication (AZURE_TENANT_ID, AZURE_USERNAME, AZURE_PASSWORD, AZURE_STORE_NAME) are correctly set, or replace placeholders.")

view raw JSON →