{"id":2408,"library":"azureml-dataprep","title":"Azure ML Data Preparation SDK","description":"The `azureml-dataprep` library is part of the Azure ML Python SDK v1, providing capabilities to load, transform, and write data for machine learning workflows within the v1 ecosystem. As of version 5.4.3, it is primarily used for creating `Dataset` objects that integrate with Azure ML workspaces (v1). While receiving maintenance updates, it is largely superseded by the Azure ML SDK v2 (`azure-ai-ml`) for new development, which offers different data handling paradigms.","status":"deprecated","version":"5.4.3","language":"en","source_language":"en","source_url":"https://github.com/Azure/azureml-python","tags":["Azure","Machine Learning","Data Prep","Data Transformation","SDK v1"],"install":[{"cmd":"pip install azureml-dataprep","lang":"bash","label":"Install `azureml-dataprep`"}],"dependencies":[{"reason":"Tight integration with Azure ML SDK v1 for workspace and dataset operations.","package":"azureml-core","optional":false}],"imports":[{"symbol":"Dataflow","correct":"from azureml.dataprep import Dataflow"},{"symbol":"read_csv","correct":"import azureml.dataprep as dprep\ndprep.read_csv(...)"}],"quickstart":{"code":"import azureml.dataprep as dprep\nimport pandas as pd\nimport os\n\n# Create a dummy CSV file for demonstration\nfile_path = \"quickstart_data.csv\"\nwith open(file_path, \"w\") as f:\n    f.write(\"id,name,value\\n\")\n    f.write(\"1,apple,100\\n\")\n    f.write(\"2,banana,200\\n\")\n    f.write(\"3,orange,150\\n\")\n\ntry:\n    # Read the CSV into a Dataflow object\n    dataflow = dprep.read_csv(file_path)\n    print(\"Original Dataflow (first 5 rows):\")\n    print(dataflow.head(5))\n\n    # Perform a simple transformation: select specific columns\n    transformed_dataflow = dataflow.keep_columns(columns=['name', 'value'])\n    print(\"\\nTransformed Dataflow (name, value columns, first 5 rows):\")\n    print(transformed_dataflow.head(5))\n\n    # Convert the Dataflow to a Pandas DataFrame for local processing\n    pandas_df = transformed_dataflow.to_pandas_dataframe()\n    print(\"\\nConverted to Pandas DataFrame:\")\n    print(pandas_df)\n\nfinally:\n    # Clean up the dummy file\n    if os.path.exists(file_path):\n        os.remove(file_path)","lang":"python","description":"Demonstrates how to read local data into a `Dataflow` object, apply a basic transformation, and convert it to a Pandas DataFrame. This showcases core `azureml-dataprep` functionalities for local data preparation without requiring an active Azure ML workspace connection."},"warnings":[{"fix":"For new projects, consider using `azure-ai-ml` and its integrated data handling capabilities, which often leverage standard Python data libraries like Pandas and PyArrow with direct cloud storage access, or MLTable assets.","message":"The `azureml-dataprep` library is part of the Azure ML Python SDK v1, which is largely superseded by the v2 SDK (`azure-ai-ml`) for new development. Microsoft recommends migrating to the v2 SDK for modern Azure ML workflows.","severity":"deprecated","affected_versions":"All versions"},{"fix":"Be mindful of the lazy evaluation paradigm. Use `head()` or `to_pandas_dataframe()` periodically during development to inspect intermediate results and ensure transformations are applied as expected.","message":"Operations on `Dataflow` objects are lazily evaluated. Transformations are not applied until an action (like `to_pandas_dataframe()`, `head()`, or `write_to_csv()`) is called, which can sometimes lead to unexpected behavior or delayed error detection.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Use isolated virtual environments for projects that rely on `azureml-dataprep` to prevent dependency conflicts. If possible, avoid mixing v1 and v2 SDK components in the same environment.","message":"`azureml-dataprep` is tightly coupled with `azureml-core` (v1 SDK) and may have version conflicts if other `azureml` packages, especially from the v2 SDK (`azure-ai-ml`), are installed in the same environment.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure your Python environment is running Python 3.8 or a later compatible version (e.g., 3.9, 3.10) before installing `azureml-dataprep`.","message":"Requires Python 3.8 or higher. Older Python versions (e.g., 3.7) are not supported by recent `azureml-dataprep` releases.","severity":"breaking","affected_versions":">=5.0.0"}],"env_vars":null,"last_verified":"2026-04-10T00:00:00.000Z","next_check":"2026-07-09T00:00:00.000Z"}