Azure Data Lake Store Filesystem Client Library
The `azure-datalake-store` library provides a pure-Python interface for Azure Data Lake Storage Gen 1, offering Pythonic file-system and file objects with capabilities for high-performance uploading and downloading. It is currently at version 1.0.1, having recently transitioned from a series of `0.0.x` pre-releases to a `1.0.x` stable branch. The project is under active development, but the official documentation notes it is 'not yet recommended for general use'. This library specifically supports ADLS Gen 1; for ADLS Gen 2, users should refer to `azure-storage-file-datalake`.
Warnings
- breaking This library is exclusively for **Azure Data Lake Storage Gen 1**. For Azure Data Lake Storage Gen 2, which is the current generation, you **must** use the `azure-storage-file-datalake` library. Using `azure-datalake-store` for Gen 2 will result in compatibility issues.
- gotcha The official documentation states that this 'software is under active development and not yet recommended for general use'. This suggests it may not be suitable for critical production workloads or that its APIs could still undergo significant changes.
- breaking Authentication mechanisms changed significantly from `0.x` to `1.0.x`. Older versions used ADAL and custom authentication. Version `1.0.0-alpha0` and `1.0.1` shifted to 'generic azure token credential for auth instead of custom lib.auth' and removed ADAL support, replacing it with MSAL internally within `lib.auth`.
- breaking In version `1.0.1`, the `concat` operation was removed from multi-part uploads, and large files are now uploaded in a single chunk. This changes the behavior for handling very large files.
- deprecated All `0.0.x` versions were explicitly labeled as 'pre-release or preview version' with a warning that there 'will be fairly rapid development and bug fixing, which might result in breaking changes from release to release.' Upgrading directly from `0.0.x` to `1.0.x` will likely involve significant breaking changes.
- breaking Version `1.0.0-alpha0` and `1.0.1` removed support for older Python versions. The specific versions removed are not detailed in the release notes, but users on older Python environments should verify compatibility.
Install
-
pip install azure-datalake-store
Imports
- AzureDLFileSystem
from azure.datalake.store import core adl_fs = core.AzureDLFileSystem(...)
- lib.auth
from azure.datalake.store import lib token = lib.auth(tenant_id, username, password)
Quickstart
import os
from azure.datalake.store import core
# Set these environment variables for authentication
# Ensure AZURE_TENANT_ID, AZURE_USERNAME, AZURE_PASSWORD, AZURE_STORE_NAME are set
# For testing, use placeholder values if not connecting to a real ADLS Gen1
tenant_id = os.environ.get('AZURE_TENANT_ID', 'YOUR_TENANT_ID')
username = os.environ.get('AZURE_USERNAME', 'YOUR_USERNAME')
password = os.environ.get('AZURE_PASSWORD', 'YOUR_PASSWORD')
store_name = os.environ.get('AZURE_STORE_NAME', 'youradlstorename')
try:
# Authenticate (lib.auth now uses generic Azure token credentials internally)
token = core.lib.auth(tenant_id, username, password)
# Initialize the Data Lake Store filesystem client
adl = core.AzureDLFileSystem(store_name, token=token)
# Example: List contents of the root directory
print(f"Listing contents of / in {store_name}:")
items = adl.ls('/', detail=True)
if items:
for item in items:
print(item)
else:
print("Directory is empty or path does not exist.")
# Example: Create a directory and a file
test_dir = 'mytestdir'
test_file = f'{test_dir}/testfile.txt'
if not adl.exists(test_dir):
adl.mkdir(test_dir)
print(f"Created directory: {test_dir}")
with adl.open(test_file, 'wb') as f:
f.write(b"Hello from Azure Data Lake Store Gen1!")
print(f"Created and wrote to file: {test_file}")
# Example: Read the file
with adl.open(test_file, 'rb') as f:
content = f.read()
print(f"Content of {test_file}: {content.decode('utf-8')}")
# Example: Delete the file and directory
adl.rm(test_file)
print(f"Deleted file: {test_file}")
adl.rmdir(test_dir)
print(f"Deleted directory: {test_dir}")
except Exception as e:
print(f"An error occurred: {e}")
print("Please ensure environment variables for ADLS Gen1 authentication (AZURE_TENANT_ID, AZURE_USERNAME, AZURE_PASSWORD, AZURE_STORE_NAME) are correctly set, or replace placeholders.")