{"id":7418,"library":"mltable","title":"MLTable","description":"mltable provides Python APIs for creating, loading, and managing the MLTable data format, a declarative and standardized way to define data for machine learning workloads. It is particularly used within Azure Machine Learning to specify datasets from various sources like local files, Delta Lake, Parquet, and CSV. The current version is 1.6.3, and it generally follows a regular release cadence.","status":"active","version":"1.6.3","language":"en","source_language":"en","source_url":"https://github.com/Azure/azure-ml-tables","tags":["machine-learning","data-format","azure","data-science","etl"],"install":[{"cmd":"pip install mltable","lang":"bash","label":"Basic Install"}],"dependencies":[{"reason":"Required for interacting with Azure Blob Storage data sources.","package":"azure-storage-blob","optional":true},{"reason":"Often used for efficient data handling with Parquet and other columnar formats.","package":"pyarrow","optional":true},{"reason":"Required for interacting with Delta Lake data sources.","package":"delta-kernel-python","optional":true}],"imports":[{"note":"While MLTable is a class, it's typically instantiated via factory methods like `mltable.from_json_lines_files()` rather than directly. The module `mltable` itself contains the core functionality.","wrong":"from mltable import MLTable","symbol":"MLTable","correct":"import mltable"},{"note":"The `load` function is a module-level function within the `mltable` package.","symbol":"load","correct":"import mltable\nmltable.load(...)"},{"note":"One of many factory methods for creating an MLTable from specific data sources.","symbol":"from_json_lines_files","correct":"import mltable\nmltable.from_json_lines_files(...)"}],"quickstart":{"code":"import mltable\nimport pandas as pd\nimport os\n\n# Create a dummy data file\nif not os.path.exists('data'):\n    os.makedirs('data')\nwith open('data/sample.jsonl', 'w') as f:\n    f.write('{\"id\": 1, \"value\": \"A\"}\\n')\n    f.write('{\"id\": 2, \"value\": \"B\"}\\n')\n\n# Create an MLTable object from a local JSON Lines file\ntbl = mltable.from_json_lines_files(files=['data/sample.jsonl'])\n\n# Save the MLTable to a directory\noutput_dir = 'my_mltable_data'\nif not os.path.exists(output_dir):\n    os.makedirs(output_dir)\n\ntbl.save(output_dir)\nprint(f\"MLTable saved to '{output_dir}'\")\n\n# Load the MLTable\nloaded_tbl = mltable.load(output_dir)\n\n# Convert to pandas DataFrame for inspection\ndf = loaded_tbl.to_pandas_dataframe()\nprint(\"Loaded DataFrame:\")\nprint(df)\n\n# Clean up dummy files\nos.remove('data/sample.jsonl')\nos.rmdir('data')\nos.remove(os.path.join(output_dir, 'MLTable'))\nos.rmdir(output_dir)","lang":"python","description":"This quickstart demonstrates how to create an MLTable from local JSON Lines files, save it to a directory, and then load it back. Finally, it converts the loaded MLTable into a Pandas DataFrame for easy viewing. This is a common pattern for defining and managing datasets."},"warnings":[{"fix":"Prefix local paths with `file://` (e.g., `mltable.from_json_lines_files(files=['file://./data.jsonl'])`) or ensure your execution environment correctly resolves relative paths for `MLTable` operations.","message":"MLTable expects paths to be URIs (e.g., `file://`, `azureml://`, `http://`). While relative local paths often work implicitly, explicitly using `file://` for local files, especially when integrating with systems like AzureML, can prevent unexpected `FileNotFoundError` or permission issues.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Carefully review the official MLTable schema documentation. Use a YAML linter if authoring MLTable files manually. Ensure all required fields like `paths` and `transformations` (if applicable) are correctly defined.","message":"MLTable files (e.g., `MLTable` YAML file) adhere to a specific schema. Incorrect YAML syntax, missing required fields, or invalid data source definitions can lead to `mltable.exceptions.ValidationException`.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Install necessary storage-specific packages (e.g., `pip install mltable[azure]`). Ensure your environment has valid Azure credentials (e.g., via environment variables, Azure CLI login, or managed identity). Refer to `azure-identity` documentation for credential setup.","message":"Using `mltable` with remote storage (e.g., Azure Blob Storage, ADLS Gen2) requires appropriate authentication and optional dependency packages (e.g., `azure-storage-blob`, `azure-identity`). Without correct setup, operations will fail with authentication errors or `FileNotFoundError`.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Run `pip install mltable` in your active Python environment.","cause":"The `mltable` package is not installed or not accessible in the current Python environment.","error":"ModuleNotFoundError: No module named 'mltable'"},{"fix":"Verify that the path provided to `mltable.load()` or within the MLTable definition points to an existing directory or file. Ensure correct casing and full paths if not using URIs. Check file permissions.","cause":"The specified path to the MLTable directory or a file referenced within the MLTable definition does not exist or is inaccessible.","error":"FileNotFoundError: No such file or directory: 'path/to/my_mltable_data'"},{"fix":"Inspect the `MLTable` file for syntax errors (e.g., indentation, missing colons) and ensure it includes all required fields and adheres to the official MLTable schema for its version. The error message usually provides more details on what failed validation.","cause":"The `MLTable` YAML file (or its JSON equivalent) at the specified path is syntactically incorrect or does not conform to the expected MLTable schema.","error":"mltable.exceptions.ValidationException: Validation error while parsing MLTable at 'MLTable'"}]}