MLTable
mltable provides Python APIs for creating, loading, and managing the MLTable data format, a declarative and standardized way to define data for machine learning workloads. It is particularly used within Azure Machine Learning to specify datasets from various sources like local files, Delta Lake, Parquet, and CSV. The current version is 1.6.3, and it generally follows a regular release cadence.
Common errors
-
ModuleNotFoundError: No module named 'mltable'
cause The `mltable` package is not installed or not accessible in the current Python environment.fixRun `pip install mltable` in your active Python environment. -
FileNotFoundError: No such file or directory: 'path/to/my_mltable_data'
cause The specified path to the MLTable directory or a file referenced within the MLTable definition does not exist or is inaccessible.fixVerify that the path provided to `mltable.load()` or within the MLTable definition points to an existing directory or file. Ensure correct casing and full paths if not using URIs. Check file permissions. -
mltable.exceptions.ValidationException: Validation error while parsing MLTable at 'MLTable'
cause The `MLTable` YAML file (or its JSON equivalent) at the specified path is syntactically incorrect or does not conform to the expected MLTable schema.fixInspect the `MLTable` file for syntax errors (e.g., indentation, missing colons) and ensure it includes all required fields and adheres to the official MLTable schema for its version. The error message usually provides more details on what failed validation.
Warnings
- gotcha MLTable expects paths to be URIs (e.g., `file://`, `azureml://`, `http://`). While relative local paths often work implicitly, explicitly using `file://` for local files, especially when integrating with systems like AzureML, can prevent unexpected `FileNotFoundError` or permission issues.
- gotcha MLTable files (e.g., `MLTable` YAML file) adhere to a specific schema. Incorrect YAML syntax, missing required fields, or invalid data source definitions can lead to `mltable.exceptions.ValidationException`.
- gotcha Using `mltable` with remote storage (e.g., Azure Blob Storage, ADLS Gen2) requires appropriate authentication and optional dependency packages (e.g., `azure-storage-blob`, `azure-identity`). Without correct setup, operations will fail with authentication errors or `FileNotFoundError`.
Install
-
pip install mltable
Imports
- MLTable
from mltable import MLTable
import mltable
- load
import mltable mltable.load(...)
- from_json_lines_files
import mltable mltable.from_json_lines_files(...)
Quickstart
import mltable
import pandas as pd
import os
# Create a dummy data file
if not os.path.exists('data'):
os.makedirs('data')
with open('data/sample.jsonl', 'w') as f:
f.write('{"id": 1, "value": "A"}\n')
f.write('{"id": 2, "value": "B"}\n')
# Create an MLTable object from a local JSON Lines file
tbl = mltable.from_json_lines_files(files=['data/sample.jsonl'])
# Save the MLTable to a directory
output_dir = 'my_mltable_data'
if not os.path.exists(output_dir):
os.makedirs(output_dir)
tbl.save(output_dir)
print(f"MLTable saved to '{output_dir}'")
# Load the MLTable
loaded_tbl = mltable.load(output_dir)
# Convert to pandas DataFrame for inspection
df = loaded_tbl.to_pandas_dataframe()
print("Loaded DataFrame:")
print(df)
# Clean up dummy files
os.remove('data/sample.jsonl')
os.rmdir('data')
os.remove(os.path.join(output_dir, 'MLTable'))
os.rmdir(output_dir)