{"id":9147,"library":"ob-metaflow","title":"Metaflow","description":"Metaflow is a human-centric framework for building and managing real-life data science projects, from prototyping to production. It enables data scientists and ML engineers to rapidly develop, deploy, and operate ML workflows. The `ob-metaflow` distribution is a specific PyPI package that provides the core Metaflow library. Its current version is 2.19.21.1, and it follows the frequent release cadence of the main Metaflow project.","status":"active","version":"2.19.21.1","language":"en","source_language":"en","source_url":"https://github.com/Netflix/metaflow","tags":["MLOps","workflow-orchestration","data-science","machine-learning","ETL","reproducibility"],"install":[{"cmd":"pip install ob-metaflow","lang":"bash","label":"Install `ob-metaflow`"}],"dependencies":[{"reason":"Required for interacting with AWS S3 for artifact storage and remote execution (optional for local-only development).","package":"boto3","optional":true},{"reason":"Required for deploying and managing Metaflow flows on Kubernetes (optional for local or AWS deployments).","package":"kubernetes","optional":true}],"imports":[{"note":"The PyPI package is `ob-metaflow`, but the module imported is `metaflow`.","wrong":"from ob_metaflow import FlowSpec","symbol":"FlowSpec","correct":"from metaflow import FlowSpec"},{"symbol":"step","correct":"from metaflow import step"},{"note":"Provides access to the current run's metadata and parameters.","symbol":"current","correct":"from metaflow import current"}],"quickstart":{"code":"from metaflow import FlowSpec, step, card\nimport os\n\nclass MyFirstMetaflowFlow(FlowSpec):\n    \"\"\"\n    A simple Metaflow flow demonstrating basic steps.\n    \"\"\"\n    @step\n    def start(self):\n        self.message = \"Hello Metaflow!\"\n        print(f\"Starting flow with message: {self.message}\")\n        self.next(self.process_data)\n\n    @step\n    def process_data(self):\n        self.data = [len(self.message), 42]\n        print(f\"Processing data: {self.data}\")\n        self.next(self.end)\n\n    @step\n    def end(self):\n        print(f\"Flow finished. Final data: {self.data}\")\n\nif __name__ == '__main__':\n    # To run: python your_flow_file.py run\n    # For this quickstart, we just instantiate it.\n    # Metaflow typically expects execution via its CLI for full features.\n    flow = MyFirstMetaflowFlow()\n    # Running via the CLI: python this_file.py run","lang":"python","description":"This quickstart defines a simple Metaflow `FlowSpec` with three steps: `start`, `process_data`, and `end`. It prints messages and passes data between steps. Note that Metaflow flows are typically executed via the Metaflow CLI (e.g., `python your_flow_file.py run`) to leverage its full capabilities like artifact tracking, resumption, and distributed execution, rather than just running the Python script directly."},"warnings":[{"fix":"Always import symbols from the `metaflow` namespace, e.g., `from metaflow import FlowSpec, step`.","message":"The PyPI package name is `ob-metaflow`, but the Python module you import is `metaflow`. Ensure you always use `from metaflow import ...` in your code.","severity":"gotcha","affected_versions":"All versions of `ob-metaflow`."},{"fix":"Execute your flow using `python your_flow_file.py run` to enable Metaflow's full functionality. Use other subcommands like `python your_flow_file.py help` for more options.","message":"Metaflow flows are designed to be run via the Metaflow CLI (`python your_flow.py run`), not by simply executing the Python script (`python your_flow.py`). Running directly will not activate Metaflow's tracking, artifact storage, or other features.","severity":"gotcha","affected_versions":"All versions."},{"fix":"Configure Metaflow with a remote data store, typically via environment variables (e.g., `METAFLOW_DATATOOLS_S3ROOT`) or a `~/.metaflow/config.json` file. Ensure appropriate cloud credentials are set up for access.","message":"For robust, resumable, and shareable flows, Metaflow requires external storage (e.g., AWS S3, Google Cloud Storage) for artifacts. Local storage is primarily for development and prototyping and is not recommended for production.","severity":"gotcha","affected_versions":"All versions."},{"fix":"When resuming old runs, try to use the same Metaflow version that created them. For new runs, ensure all components (local, remote) use consistent Metaflow versions. If migrating, consider re-running flows or manually porting artifacts.","message":"Metaflow's default serialization engine switched from 'pickle' to 'cloudpickle' in version 2.0. Additionally, the default protocol for 'cloudpickle' was updated in later 2.x versions. This can cause issues when resuming or inspecting old runs created with different Metaflow versions.","severity":"breaking","affected_versions":"Mainly 2.0+ when interacting with runs from <2.0, or specific 2.x versions with protocol changes."}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Install the package using `pip install ob-metaflow` in your active Python environment.","cause":"The `ob-metaflow` PyPI package has not been installed, or it's installed in a different Python environment.","error":"ModuleNotFoundError: No module named 'metaflow'"},{"fix":"Set the `METAFLOW_DATATOOLS_S3ROOT` environment variable (e.g., `export METAFLOW_DATATOOLS_S3ROOT=s3://your-bucket/metaflow`) or configure it in `~/.metaflow/config.json`. Ensure your AWS credentials are also correctly set.","cause":"Metaflow is trying to store artifacts remotely (e.g., for `start --environment=conda`), but no S3 bucket has been configured.","error":"MetaflowException: You need to specify a S3 bucket or path using METAFLOW_DATATOOLS_S3ROOT or configure a default S3 root in ~/.metaflow/config.json"},{"fix":"Ensure your AWS credentials are configured via environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`), AWS IAM roles (for EC2/EKS), or a `~/.aws/credentials` file.","cause":"Metaflow requires AWS credentials to access S3 buckets for artifact storage and remote execution.","error":"MetaflowException: Could not find credentials to access S3"}]}