Azure ML Data Preparation SDK

5.4.3 · deprecated · verified Fri Apr 10

The `azureml-dataprep` library is part of the Azure ML Python SDK v1, providing capabilities to load, transform, and write data for machine learning workflows within the v1 ecosystem. As of version 5.4.3, it is primarily used for creating `Dataset` objects that integrate with Azure ML workspaces (v1). While receiving maintenance updates, it is largely superseded by the Azure ML SDK v2 (`azure-ai-ml`) for new development, which offers different data handling paradigms.

Warnings

Install

Imports

Quickstart

Demonstrates how to read local data into a `Dataflow` object, apply a basic transformation, and convert it to a Pandas DataFrame. This showcases core `azureml-dataprep` functionalities for local data preparation without requiring an active Azure ML workspace connection.

import azureml.dataprep as dprep
import pandas as pd
import os

# Create a dummy CSV file for demonstration
file_path = "quickstart_data.csv"
with open(file_path, "w") as f:
    f.write("id,name,value\n")
    f.write("1,apple,100\n")
    f.write("2,banana,200\n")
    f.write("3,orange,150\n")

try:
    # Read the CSV into a Dataflow object
    dataflow = dprep.read_csv(file_path)
    print("Original Dataflow (first 5 rows):")
    print(dataflow.head(5))

    # Perform a simple transformation: select specific columns
    transformed_dataflow = dataflow.keep_columns(columns=['name', 'value'])
    print("\nTransformed Dataflow (name, value columns, first 5 rows):")
    print(transformed_dataflow.head(5))

    # Convert the Dataflow to a Pandas DataFrame for local processing
    pandas_df = transformed_dataflow.to_pandas_dataframe()
    print("\nConverted to Pandas DataFrame:")
    print(pandas_df)

finally:
    # Clean up the dummy file
    if os.path.exists(file_path):
        os.remove(file_path)

view raw JSON →