pyarrowfs-adlgen2

raw JSON →
0.2.5 verified Mon Apr 27 auth: no python maintenance

PyArrow filesystem interface for Azure Data Lake Storage Gen2. Version 0.2.5, compatible with Python >=3.6 and Apache Arrow. Last release in 2021, low maintenance.

pip install pyarrowfs-adlgen2
error ImportError: cannot import name 'AdlGen2FileSystem' from 'pyarrow.fs'
cause pyarrowfs-adlgen2 not installed or not loaded; the filesystem is only available after installing the library and importing pyarrow.
fix
Run 'pip install pyarrowfs-adlgen2' and then 'import pyarrow.fs' before using AdlGen2FileSystem.
error AzureError: Client authentication failed
cause Missing or invalid credentials. The library requires either account_key or a combination of tenant_id, client_id, client_secret, or DefaultAzureCredential.
fix
Set environment variables AZURE_TENANT_ID, AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, or provide account_key parameter.
gotcha The filesystem is not imported from pyarrowfs-adlgen2 directly; it is registered into pyarrow.fs. Import from pyarrow.fs as shown.
fix Use 'from pyarrow.fs import AdlGen2FileSystem' instead of 'from pyarrowfs_adlgen2 import ...'.
deprecated The library appears unmaintained since 2021 and may not work with newer PyArrow versions (>=12). The Azure SDK dependencies are also outdated.
fix Consider using native PyArrow Azure support if available, or pin pyarrow<12.

Initialize ADLS Gen2 filesystem and list files using PyArrow.

import os
from pyarrow.fs import AdlGen2FileSystem

storage_account = "mystorageaccount"
container = "mycontainer"

# Use DefaultAzureCredential (requires azure-identity)
fs = AdlGen2FileSystem(
    account_name=storage_account,
    account_key=os.environ.get('ADLS_KEY', ''),  # optional key
    tenant_id=os.environ.get('AZURE_TENANT_ID', ''),
    client_id=os.environ.get('AZURE_CLIENT_ID', ''),
    client_secret=os.environ.get('AZURE_CLIENT_SECRET', '')
)

# List files
print(fs.get_file_info([f"{container}/some/path"]))