AzureML fsspec Protocol Handler

1.3.1 · active · verified Thu Apr 16

The `azureml-fsspec` library enables the `fsspec` (Filesystem Spec) library to interact with Azure Machine Learning datastores. It registers 'azureml://' and 'adl://' protocols, allowing users to access, read, and write files within Azure ML datastores using the familiar `fsspec` API. The current version is 1.3.1, and it's released as part of the broader Azure ML SDK ecosystem, typically following its release cadence.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to import `azureml-fsspec` to register its protocols with `fsspec` and how to obtain an `AzureMLFileSystem` instance. It highlights the need for Azure ML workspace details (subscription ID, resource group, workspace name, datastore name), typically provided via environment variables. Actual I/O operations (like listing files) are commented out, as they require a live Azure ML setup with valid credentials and permissions to succeed without errors.

import fsspec
import azureml.fsspec # This import registers the "azureml" and "adl" protocols
import os

print("azureml-fsspec imported, protocols registered.")

# Environment variables for AzureML workspace details. 
# In a real scenario, these must be set or passed via the fsspec.filesystem() call or URI query params.
subscription_id = os.environ.get("AZUREML_SUBSCRIPTION_ID", "")
resource_group = os.environ.get("AZUREML_RESOURCE_GROUP", "")
workspace_name = os.environ.get("AZUREML_WORKSPACE_NAME", "")
datastore_name = os.environ.get("AZUREML_DATASTORE_NAME", "workspaceblobstore") # Common default datastore name

if all([subscription_id, resource_group, workspace_name]):
    print(f"\nAttempting to get AzureML filesystem instance for datastore: {datastore_name}...")
    try:
        # Get a filesystem instance using specific workspace details and datastore.
        # This approach explicitly provides connection details.
        fs = fsspec.filesystem("azureml",
                                subscription_id=subscription_id,
                                resource_group=resource_group,
                                workspace_name=workspace_name,
                                datastore_name=datastore_name)
        print(f"Successfully initialized AzureML filesystem object: {type(fs)}")

        # Example: Try to list the root of the specified datastore. 
        # This will only succeed with valid credentials and permissions.
        # path_to_list = f"azureml://datastores/{datastore_name}/paths/"
        # print(f"Attempting to list contents of: {path_to_list}")
        # contents = fs.ls("/") # Listing root of the specific datastore instance
        # print(f"First 5 items from {datastore_name} root: {contents[:5]}...") 
        # print(f"Total items found: {len(contents)}")

    except Exception as e:
        print(f"Could not initialize AzureML filesystem (this is common without a full setup): {e}")
        print("Please ensure AZUREML_SUBSCRIPTION_ID, AZUREML_RESOURCE_GROUP, AZUREML_WORKSPACE_NAME, and AZUREML_DATASTORE_NAME are correctly set as environment variables or passed as arguments.")
else:
    print("\nSkipping AzureML filesystem initialization and IO: AzureML environment variables not fully set.")
    print("Set AZUREML_SUBSCRIPTION_ID, AZUREML_RESOURCE_GROUP, AZUREML_WORKSPACE_NAME, and AZUREML_DATASTORE_NAME for a live example.")

print("\nDemonstrating direct import of AzureMLFileSystem class:")
from azureml.fsspec import AzureMLFileSystem
print(f"AzureMLFileSystem class found: {AzureMLFileSystem}")

view raw JSON →