MLTable

1.6.3 · active · verified Thu Apr 16

mltable provides Python APIs for creating, loading, and managing the MLTable data format, a declarative and standardized way to define data for machine learning workloads. It is particularly used within Azure Machine Learning to specify datasets from various sources like local files, Delta Lake, Parquet, and CSV. The current version is 1.6.3, and it generally follows a regular release cadence.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to create an MLTable from local JSON Lines files, save it to a directory, and then load it back. Finally, it converts the loaded MLTable into a Pandas DataFrame for easy viewing. This is a common pattern for defining and managing datasets.

import mltable
import pandas as pd
import os

# Create a dummy data file
if not os.path.exists('data'):
    os.makedirs('data')
with open('data/sample.jsonl', 'w') as f:
    f.write('{"id": 1, "value": "A"}\n')
    f.write('{"id": 2, "value": "B"}\n')

# Create an MLTable object from a local JSON Lines file
tbl = mltable.from_json_lines_files(files=['data/sample.jsonl'])

# Save the MLTable to a directory
output_dir = 'my_mltable_data'
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

tbl.save(output_dir)
print(f"MLTable saved to '{output_dir}'")

# Load the MLTable
loaded_tbl = mltable.load(output_dir)

# Convert to pandas DataFrame for inspection
df = loaded_tbl.to_pandas_dataframe()
print("Loaded DataFrame:")
print(df)

# Clean up dummy files
os.remove('data/sample.jsonl')
os.rmdir('data')
os.remove(os.path.join(output_dir, 'MLTable'))
os.rmdir(output_dir)

view raw JSON →