pyarrowfs-adlgen2
raw JSON → 0.2.5 verified Mon Apr 27 auth: no python maintenance
PyArrow filesystem interface for Azure Data Lake Storage Gen2. Version 0.2.5, compatible with Python >=3.6 and Apache Arrow. Last release in 2021, low maintenance.
pip install pyarrowfs-adlgen2 Common errors
error ImportError: cannot import name 'AdlGen2FileSystem' from 'pyarrow.fs' ↓
cause pyarrowfs-adlgen2 not installed or not loaded; the filesystem is only available after installing the library and importing pyarrow.
fix
Run 'pip install pyarrowfs-adlgen2' and then 'import pyarrow.fs' before using AdlGen2FileSystem.
error AzureError: Client authentication failed ↓
cause Missing or invalid credentials. The library requires either account_key or a combination of tenant_id, client_id, client_secret, or DefaultAzureCredential.
fix
Set environment variables AZURE_TENANT_ID, AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, or provide account_key parameter.
Warnings
gotcha The filesystem is not imported from pyarrowfs-adlgen2 directly; it is registered into pyarrow.fs. Import from pyarrow.fs as shown. ↓
fix Use 'from pyarrow.fs import AdlGen2FileSystem' instead of 'from pyarrowfs_adlgen2 import ...'.
deprecated The library appears unmaintained since 2021 and may not work with newer PyArrow versions (>=12). The Azure SDK dependencies are also outdated. ↓
fix Consider using native PyArrow Azure support if available, or pin pyarrow<12.
Imports
- AdlGen2FileSystem
from pyarrow.fs import AdlGen2FileSystem
Quickstart
import os
from pyarrow.fs import AdlGen2FileSystem
storage_account = "mystorageaccount"
container = "mycontainer"
# Use DefaultAzureCredential (requires azure-identity)
fs = AdlGen2FileSystem(
account_name=storage_account,
account_key=os.environ.get('ADLS_KEY', ''), # optional key
tenant_id=os.environ.get('AZURE_TENANT_ID', ''),
client_id=os.environ.get('AZURE_CLIENT_ID', ''),
client_secret=os.environ.get('AZURE_CLIENT_SECRET', '')
)
# List files
print(fs.get_file_info([f"{container}/some/path"]))