AzureML fsspec Protocol Handler
The `azureml-fsspec` library enables the `fsspec` (Filesystem Spec) library to interact with Azure Machine Learning datastores. It registers 'azureml://' and 'adl://' protocols, allowing users to access, read, and write files within Azure ML datastores using the familiar `fsspec` API. The current version is 1.3.1, and it's released as part of the broader Azure ML SDK ecosystem, typically following its release cadence.
Common errors
-
fsspec.exceptions.FSSpecError: No such protocol: azureml
cause The `azureml` protocol was not registered with `fsspec`.fixEnsure `azureml-fsspec` is installed (`pip install azureml-fsspec`) and that `import azureml.fsspec` is present in your code before attempting to use an `azureml://` or `adl://` URI with `fsspec`. -
AuthenticationException: Authentication failed. Please check your credentials.
cause The AzureML SDK could not authenticate with your Azure subscription or workspace.fixVerify your Azure authentication setup. This could involve running `az login`, setting environment variables for service principal details (e.g., `AZURE_CLIENT_ID`, `AZURE_CLIENT_SECRET`, `AZURE_TENANT_ID`), or ensuring your managed identity has the correct permissions. -
azureml.exceptions._azureml_exception.DatastoreNotFoundException: Datastore with name '<your_datastore_name>' not found.
cause The datastore name specified in the URI or `fsspec.filesystem` call does not exist in the targeted Azure ML workspace, or the workspace details are incorrect.fixDouble-check the datastore name for typos. Ensure the `subscription_id`, `resource_group`, and `workspace_name` parameters or environment variables point to the correct Azure ML workspace where the datastore is registered. -
ValueError: Invalid URI format: 'azureml://...' (or similar URI parsing error)
cause The provided `azureml://` or `adl://` URI does not conform to the expected structure or is missing required components.fixReview the correct URI format for `azureml-fsspec`. It typically follows `azureml://datastores/<datastore_name>/paths/<path_to_file>`. Ensure all necessary parts like datastore name and path are correctly specified.
Warnings
- gotcha AzureML authentication is required and can be complex. This library relies on the underlying AzureML SDK's authentication mechanisms, which can include environment variables (e.g., AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_TENANT_ID), Azure CLI login, or managed identity.
- gotcha Forgetting to import `azureml.fsspec` will prevent the 'azureml' and 'adl' protocols from being registered with `fsspec`, leading to 'No such protocol' errors.
- gotcha Incorrect or incomplete Azure ML URI format can lead to errors. The `azureml://` protocol expects specific parameters (subscription ID, resource group, workspace name, datastore name, and path). These can be in the URI itself or passed as keyword arguments to `fsspec.filesystem()`.
- gotcha Requires specific minimum versions of `fsspec` and `azureml-core`. Older versions of these dependencies might not provide the necessary APIs or features.
Install
-
pip install azureml-fsspec
Imports
- AzureMLFileSystem
from azureml.fsspec import AzureMLFileSystem
- Protocol Registration
import azureml.fsspec
Quickstart
import fsspec
import azureml.fsspec # This import registers the "azureml" and "adl" protocols
import os
print("azureml-fsspec imported, protocols registered.")
# Environment variables for AzureML workspace details.
# In a real scenario, these must be set or passed via the fsspec.filesystem() call or URI query params.
subscription_id = os.environ.get("AZUREML_SUBSCRIPTION_ID", "")
resource_group = os.environ.get("AZUREML_RESOURCE_GROUP", "")
workspace_name = os.environ.get("AZUREML_WORKSPACE_NAME", "")
datastore_name = os.environ.get("AZUREML_DATASTORE_NAME", "workspaceblobstore") # Common default datastore name
if all([subscription_id, resource_group, workspace_name]):
print(f"\nAttempting to get AzureML filesystem instance for datastore: {datastore_name}...")
try:
# Get a filesystem instance using specific workspace details and datastore.
# This approach explicitly provides connection details.
fs = fsspec.filesystem("azureml",
subscription_id=subscription_id,
resource_group=resource_group,
workspace_name=workspace_name,
datastore_name=datastore_name)
print(f"Successfully initialized AzureML filesystem object: {type(fs)}")
# Example: Try to list the root of the specified datastore.
# This will only succeed with valid credentials and permissions.
# path_to_list = f"azureml://datastores/{datastore_name}/paths/"
# print(f"Attempting to list contents of: {path_to_list}")
# contents = fs.ls("/") # Listing root of the specific datastore instance
# print(f"First 5 items from {datastore_name} root: {contents[:5]}...")
# print(f"Total items found: {len(contents)}")
except Exception as e:
print(f"Could not initialize AzureML filesystem (this is common without a full setup): {e}")
print("Please ensure AZUREML_SUBSCRIPTION_ID, AZUREML_RESOURCE_GROUP, AZUREML_WORKSPACE_NAME, and AZUREML_DATASTORE_NAME are correctly set as environment variables or passed as arguments.")
else:
print("\nSkipping AzureML filesystem initialization and IO: AzureML environment variables not fully set.")
print("Set AZUREML_SUBSCRIPTION_ID, AZUREML_RESOURCE_GROUP, AZUREML_WORKSPACE_NAME, and AZUREML_DATASTORE_NAME for a live example.")
print("\nDemonstrating direct import of AzureMLFileSystem class:")
from azureml.fsspec import AzureMLFileSystem
print(f"AzureMLFileSystem class found: {AzureMLFileSystem}")