Azure Storage File DataLake Client Library

raw JSON →
12.23.0 verified Tue May 12 auth: no python install: verified

Microsoft Azure File DataLake Storage Client Library for Python provides APIs for interacting with Azure Data Lake Storage Gen2, which offers hierarchical namespace capabilities on top of Azure Blob Storage. This library enables developers to manage file systems, directories, and files, including operations for creating, renaming, deleting, and managing access control lists (ACLs). Azure SDKs typically receive frequent updates, often monthly or bi-monthly, focusing on new features, bug fixes, and alignment with new service API versions.

pip install azure-storage-file-datalake
error ModuleNotFoundError: No module named 'azure.storage.filedatalake'
cause The 'azure-storage-file-datalake' package is not installed or is not accessible within the current Python environment.
fix
Install the package using pip: pip install azure-storage-file-datalake
error azure.core.exceptions.ClientAuthenticationError: Operation returned an invalid status 'Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.' ErrorCode:AuthenticationFailed.
cause The credentials (account key, SAS token, or Azure AD token) provided are incorrect, expired, or the authenticating principal lacks the necessary permissions to perform the requested operation on the storage resource.
fix
Verify the storage account name and access key, ensure the SAS token is valid and correctly formatted, or check the Azure AD application/service principal permissions (e.g., 'Storage Blob Data Contributor' role) and account firewall settings. Update the SDK version if using an older urllib3 version.
error Operation returned an invalid status code 'NotFound'. Account: '<account>'. FileSystem: '<filesystem>'. Path: '<path>'. ErrorCode: 'PathNotFound'. Message: 'The specified path does not exist.'
cause The specified file system, directory, or file path does not exist in the Azure Data Lake Storage Gen2 account, or the storage account itself does not exist or has hierarchical namespace disabled.
fix
Double-check the spelling and existence of the storage account, file system, and the complete path. Confirm that the storage account has hierarchical namespace (HNS) enabled for Data Lake Storage Gen2 operations.
error AttributeError: 'str' object has no attribute 'get'
cause This error often occurs when an invalid or improperly constructed credential object, such as a raw access token string, is passed to a client constructor (e.g., `DataLakeServiceClient`) that expects a `TokenCredential` instance from `azure.identity`. It can also happen when mixing `azure.identity.aio` imports with synchronous usage.
fix
Ensure that a proper TokenCredential object (e.g., DefaultAzureCredential() or ClientSecretCredential(...) from azure.identity) is instantiated and passed as the credential argument, rather than a plain string. If using asynchronous clients, ensure aiohttp is installed and handle credentials correctly within an async context.
breaking This library (`azure-storage-file-datalake`) is specifically for Azure Data Lake Storage Gen2. Its API differs significantly from older Gen1 Data Lake Store libraries (`azure-datalake-store`) and general Blob Storage (`azure-storage-blob`) when performing hierarchical operations. Migration from older versions or relying solely on Blob APIs for Gen2 may require substantial code changes to leverage full hierarchical namespace capabilities.
fix Review official migration guides for Azure Data Lake Storage Gen2 and ensure DataLake-specific clients (e.g., `DataLakeServiceClient`, `DataLakeDirectoryClient`) are used for hierarchical operations and ACL management.
gotcha To use secure, token-based authentication (recommended for production), the `azure-identity` library is required. Failure to install `azure-identity` will result in a `ModuleNotFoundError` when attempting to import `azure.identity`. Using account keys or connection strings directly for authentication is less secure and not recommended for production environments.
fix Install `azure-identity` (e.g., `pip install azure-identity`). Then, prefer token-based authentication using `azure-identity`'s `DefaultAzureCredential` with Microsoft Entra ID (Azure AD). This allows for various secure authentication flows, including environment variables, managed identities, and Azure CLI, without hardcoding sensitive keys.
gotcha When writing data to a file using `DataLakeFileClient.append_data()`, the data is buffered and not immediately committed or visible until `DataLakeFileClient.flush_data()` is explicitly called. Forgetting to call `flush_data()` can result in incomplete or invisible file content.
fix Always call `DataLakeFileClient.flush_data()` after one or more `append_data()` calls to ensure all buffered data is committed to the file and becomes persistent and visible.
gotcha Azure Data Lake Storage Gen2 supports multi-protocol access, allowing both Blob APIs and Data Lake APIs. However, for operations unique to hierarchical namespaces (like atomic directory renames, creating directories, and fine-grained ACLs), it is crucial to use the `azure-storage-file-datalake` APIs. Using Blob APIs for these specific tasks may result in incorrect behavior, errors, or a lack of functionality.
fix When working with hierarchical namespace features (directories, ACLs), always use the `azure-storage-file-datalake` client library and its dedicated methods.
gotcha `azure-storage-file-datalake` often relies on `azure-identity` for token-based authentication. `azure-identity` is a separate package and must be explicitly installed alongside `azure-storage-file-datalake` if its authentication methods (e.g., `DefaultAzureCredential`) are used, otherwise, a `ModuleNotFoundError` will occur.
fix Ensure `azure-identity` is installed in your environment (e.g., `pip install azure-identity`) if your application uses its authentication mechanisms.
python os / libc status wheel install import disk mem side effects
3.10 alpine (musl) wheel - 1.33s 47.5M 19.5M clean
3.10 alpine (musl) - - 1.29s 46.4M 19.2M -
3.10 slim (glibc) wheel 3.7s 1.03s 48M 19.5M clean
3.10 slim (glibc) - - 0.94s 47M 19.2M -
3.11 alpine (musl) wheel - 1.54s 51.7M 22.4M clean
3.11 alpine (musl) - - 1.67s 50.5M 22.2M -
3.11 slim (glibc) wheel 3.5s 1.39s 52M 22.4M clean
3.11 slim (glibc) - - 1.34s 51M 22.2M -
3.12 alpine (musl) wheel - 1.76s 43.2M 22.2M clean
3.12 alpine (musl) - - 1.90s 42.0M 21.9M -
3.12 slim (glibc) wheel 3.2s 1.69s 44M 22.2M clean
3.12 slim (glibc) - - 1.76s 42M 21.9M -
3.13 alpine (musl) wheel - 1.86s 42.8M 23.1M clean
3.13 alpine (musl) - - 1.96s 41.5M 22.8M -
3.13 slim (glibc) wheel 3.2s 1.66s 43M 23.1M clean
3.13 slim (glibc) - - 1.85s 42M 22.8M -
3.9 alpine (musl) wheel - 1.30s 47.5M 19.3M clean
3.9 alpine (musl) - - 1.21s 46.4M 19.1M -
3.9 slim (glibc) wheel 4.4s 1.22s 48M 19.3M clean
3.9 slim (glibc) - - 1.14s 47M 19.1M -

Demonstrates how to create a `DataLakeServiceClient` using `DefaultAzureCredential` for authentication and then lists all file systems (containers) within the Azure Data Lake Storage Gen2 account. Ensure `AZURE_STORAGE_ACCOUNT_NAME` and appropriate Azure Identity environment variables (e.g., `AZURE_TENANT_ID`, `AZURE_CLIENT_ID`, `AZURE_CLIENT_SECRET`) are set for successful execution.

import os
from azure.storage.filedatalake import DataLakeServiceClient
from azure.identity import DefaultAzureCredential

# Ensure environment variables are set for authentication and account URL:
# AZURE_STORAGE_ACCOUNT_NAME: Name of your Azure Data Lake Storage Gen2 account
# AZURE_TENANT_ID, AZURE_CLIENT_ID, AZURE_CLIENT_SECRET for DefaultAzureCredential

try:
    account_name = os.environ.get("AZURE_STORAGE_ACCOUNT_NAME")
    if not account_name:
        raise ValueError("AZURE_STORAGE_ACCOUNT_NAME environment variable not set.")

    # Construct the account URL for Data Lake Storage Gen2
    # Note: .dfs.core.windows.net is used for Data Lake Storage Gen2 endpoints
    account_url = f"https://{account_name}.dfs.core.windows.net"

    # Authenticate using DefaultAzureCredential (recommended for production)
    # DefaultAzureCredential tries various authentication methods, including environment variables,
    # managed identity, Azure CLI, etc.
    credential = DefaultAzureCredential()

    # Create a DataLakeServiceClient
    service_client = DataLakeServiceClient(account_url, credential=credential)

    print(f"Listing file systems in account: {account_name}")
    file_systems = service_client.list_file_systems()
    for fs in file_systems:
        print(f"- {fs.name}")

except Exception as e:
    print(f"An error occurred: {e}")
    print("Please ensure AZURE_STORAGE_ACCOUNT_NAME and authentication credentials (e.g., AZURE_TENANT_ID, AZURE_CLIENT_ID, AZURE_CLIENT_SECRET for service principal) are correctly configured.")