{"id":5278,"library":"kedro","title":"Kedro","description":"Kedro is an open-source Python framework for creating reproducible, maintainable, and modular data science code. It applies software engineering best practices to data and analytics pipelines. The current version is 1.3.1, and releases are frequent, typically with patch and minor updates released monthly, and major versions less often.","status":"active","version":"1.3.1","language":"en","source_language":"en","source_url":"https://github.com/kedro-org/kedro","tags":["data-pipelines","etl","mlops","data-science","orchestration","reproducibility"],"install":[{"cmd":"pip install kedro","lang":"bash","label":"Basic Install"},{"cmd":"pip install \"kedro[pandas,spark]\"","lang":"bash","label":"Install with Data Connectors (e.g., Pandas, Spark)"}],"dependencies":[{"reason":"Kedro requires Python >=3.10 as of version 1.1.0.","package":"python","optional":false}],"imports":[{"symbol":"Pipeline","correct":"from kedro.pipeline import Pipeline"},{"symbol":"node","correct":"from kedro.pipeline import node"},{"note":"`KedroDataCatalog` was renamed to `DataCatalog` and became the default in Kedro 1.0.0.","wrong":"from kedro.io import KedroDataCatalog","symbol":"DataCatalog","correct":"from kedro.io import DataCatalog"},{"symbol":"MemoryDataSet","correct":"from kedro.io import MemoryDataSet"},{"symbol":"SequentialRunner","correct":"from kedro.runner import SequentialRunner"},{"symbol":"KedroSession","correct":"from kedro.framework.session import KedroSession"}],"quickstart":{"code":"from kedro.io import DataCatalog, MemoryDataSet\nfrom kedro.pipeline import Pipeline, node\nfrom kedro.runner import SequentialRunner\n\n# 1. Define node functions (plain Python functions)\ndef greet(name: str) -> str:\n    \"\"\"A node that greets a given name.\"\"\"\n    return f\"Hello, {name}!\"\n\ndef capitalize(text: str) -> str:\n    \"\"\"A node that capitalizes a string.\"\"\"\n    return text.upper()\n\n# 2. Assemble nodes into a pipeline\ndef create_example_pipeline() -> Pipeline:\n    return Pipeline([\n        node(\n            func=greet,\n            inputs=\"input_name\", # Input dataset key\n            outputs=\"greeting_message\", # Output dataset key\n            name=\"greet_user_node\"\n        ),\n        node(\n            func=capitalize,\n            inputs=\"greeting_message\",\n            outputs=\"final_output\", # Final output dataset key\n            name=\"capitalize_message_node\"\n        )\n    ])\n\n# 3. Create a DataCatalog with input data\n# In a real Kedro project, this is usually defined in conf/base/catalog.yml\ncatalog = DataCatalog({\n    \"input_name\": MemoryDataSet(data=\"World\"),\n    \"final_output\": MemoryDataSet() # Define an output dataset to store results\n})\n\n# 4. Instantiate the pipeline and a runner\nmy_pipeline = create_example_pipeline()\nrunner = SequentialRunner()\n\n# 5. Run the pipeline\n# In a real Kedro project, `kedro run` via `KedroSession` orchestrates this.\nprint(\"Running Kedro pipeline...\")\nresult_catalog = runner.run(my_pipeline, catalog)\n\n# 6. Retrieve results\nfinal_message = result_catalog.load(\"final_output\")\nprint(f\"Pipeline finished. Final message: {final_message}\")\n# Expected output: Pipeline finished. Final message: HELLO, WORLD!\n","lang":"python","description":"This example demonstrates how to define Kedro nodes and combine them into a pipeline. It then uses a `SequentialRunner` and an in-memory `DataCatalog` to execute the pipeline. In a typical Kedro project, `kedro new` creates a project structure, and `kedro run` orchestrates execution via `KedroSession`, loading configurations from `conf/` files."},"warnings":[{"fix":"Upgrade your Python environment to 3.10 or a newer supported version.","message":"Kedro dropped support for Python 3.9 in version 1.1.0. Projects using Kedro 1.1.0 or newer must use Python 3.10 or later.","severity":"breaking","affected_versions":">=1.1.0"},{"fix":"Use `DataCatalog` instead of `KedroDataCatalog`. Review any custom catalog interactions for compatibility, especially error handling for missing datasets which now raise `DatasetNotFoundError`.","message":"The `KedroDataCatalog` class was renamed to `DataCatalog` and became the default catalog implementation in Kedro 1.0.0. While most standard workflows were unaffected, programmatic interactions with the catalog, especially direct instantiation or accessing missing datasets (`__getitem__`), might require updates.","severity":"breaking","affected_versions":">=1.0.0"},{"fix":"Always initialize new projects with `kedro new` and adhere to the generated project structure. When modifying configuration, follow the `conf/base` and `conf/local` conventions.","message":"Kedro relies heavily on its project structure (created by `kedro new`) and configuration files in the `conf/` directory. Deviations or manually created projects without the correct structure can lead to `KedroContextError` or `ConfigLoaderError`.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Use `params:` for values that are passed as inputs to node functions, often dynamic. Use `parameters:` for static, global configurations loaded via the `DataCatalog` and accessed from the `context.params` object or directly as dataset entries.","message":"Confusion between `params:` (dynamic parameters passed as node inputs) and `parameters:` (static configuration loaded from `conf/catalog.yml`) is a common pitfall. The new parameter validation (Kedro >=1.3.0) specifically targets `params:` inputs.","severity":"gotcha","affected_versions":"All versions"},{"fix":"As of `kedro>=1.1.1`, only major version mismatches are strictly enforced. For older versions, ensure your project's `project_version` matches your installed Kedro package version. For new projects, ensure you're on a recent Kedro version to benefit from the more flexible version check.","message":"Prior to `kedro==1.1.1`, the `project_version` specified in `src/<project_name>/settings.py` had to *exactly match* the installed Kedro package version (including minor and patch versions) to avoid a `ProjectVersionError`.","severity":"gotcha","affected_versions":"<1.1.1"},{"fix":"Refactor pipelines to use modular pipeline features (`Pipeline(namespace=...)`) and explicit dataset naming conventions instead of relying on the deprecated `--namespace` flag.","message":"The `--namespace` CLI flag for `kedro run` was deprecated in version 0.19.15 and is discouraged. Kedro now promotes using proper modular pipelines and explicit dataset prefixing for organization.","severity":"deprecated","affected_versions":">=0.19.15"},{"fix":"Be aware that `@experimental` APIs are subject to change. Avoid using them in production code unless you are prepared to adapt to potential breaking changes in future releases.","message":"Public APIs marked with the `@experimental` decorator (introduced in 1.2.0) are unstable and may change without backward compatibility guarantees. Use them with caution.","severity":"gotcha","affected_versions":">=1.2.0"}],"env_vars":null,"last_verified":"2026-04-13T00:00:00.000Z","next_check":"2026-07-12T00:00:00.000Z"}