Feast Feature Store
Feast is an open-source feature store that enables data scientists and engineers to productionize machine learning features. It provides a consistent way to define, manage, and serve features for both model training (historical data) and online inference (low-latency serving). Feast is actively maintained, with new releases typically occurring monthly.
Warnings
- breaking Feast versions, especially in the 0.x series, often introduce breaking changes to the Python API, CLI, and `feature_store.yaml` schema. Always review release notes when upgrading.
- gotcha Feast requires provider-specific dependencies for connecting to various offline and online stores (e.g., AWS, GCP, Azure, Spark, Snowflake). These are not installed by default with `pip install feast`.
- gotcha The `FeatureStore` constructor expects a `repo_path` pointing to a directory containing `feature_store.yaml` and your feature definition files (`.py`). If not specified, it defaults to the current working directory, which can lead to `FileNotFoundError` or unexpected behavior.
- gotcha For local development with the 'local' provider, `registry` and `online_store` types (e.g., `sqlite`) should specify persistent file paths in `feature_store.yaml` (e.g., `registry: data/registry.db`, `online_store: type: sqlite`, `path: data/online_store.db`) to avoid losing definitions or online features between sessions.
Install
-
pip install 'feast[local]' -
pip install 'feast[aws]' # or feast[gcp], feast[azure], feast[spark], etc.
Imports
- FeatureStore
from feast import FeatureStore
- Entity
from feast import Entity
- FeatureView
from feast import FeatureView
- Field
from feast import Field
- ValueType
from feast import ValueType
- FileSource
from feast.infra.offline_stores.file_source import FileSource
- RepoConfig
from feast import RepoConfig
Quickstart
import pandas as pd
import os
import shutil
from feast import FeatureStore, Entity, FeatureView, Field, ValueType
from feast.infra.offline_stores.file_source import FileSource
# --- 1. Define feature repository structure and data ---
# Create a dummy feature_repo directory for the quickstart
repo_path = "feature_repo"
if not os.path.exists(repo_path): os.makedirs(repo_path)
# Create a dummy feature_store.yaml inside the repo_path
with open(os.path.join(repo_path, "feature_store.yaml"), "w") as f:
f.write("project: default_project\n")
f.write("provider: local\n")
f.write("registry: data/registry.db\n")
f.write("online_store:\n")
f.write(" type: sqlite\n")
f.write(" path: data/online_store.db\n")
f.write("offline_store:\n")
f.write(" type: local\n")
# Create dummy data for our feature view
user_df = pd.DataFrame({
"user_id": [1001, 1002, 1003, 1004],
"age": [25, 30, 22, 35],
"city": ["NYC", "SF", "LA", "Chicago"],
"event_timestamp": [pd.Timestamp("2023-01-01", tz="UTC"), pd.Timestamp("2023-01-02", tz="UTC"), pd.Timestamp("2023-01-03", tz="UTC"), pd.Timestamp("2023-01-04", tz="UTC")]
})
# Simulate writing to a file source within the repo for cleanliness
user_data_path = os.path.join(repo_path, "user_data.parquet")
user_df.to_parquet(user_data_path)
# Define an Entity
user = Entity(name="user_id", description="User ID", value_type=ValueType.INT64)
# Define an Offline FileSource
user_features_source = FileSource(
path=user_data_path,
timestamp_field="event_timestamp"
)
# Define a FeatureView
user_feature_view = FeatureView(
name="user_profile",
entities=[user],
ttl=pd.Timedelta(days=365),
schema=[
Field(name="age", value_type=ValueType.INT64),
Field(name="city", value_type=ValueType.STRING),
],
source=user_features_source
)
# --- 2. Initialize and apply FeatureStore ---
# Initialize FeatureStore by pointing to the repository path
fs = FeatureStore(repo_path=repo_path)
# Apply (register) the feature definitions programmatically
# This simulates `feast apply` CLI command.
fs.apply([user, user_feature_view])
# --- 3. Get historical features ---
# Create an entity_df for historical feature retrieval
entity_df = pd.DataFrame({
"user_id": [1001, 1002, 1003, 1004],
"event_timestamp": [pd.Timestamp("2023-01-05", tz="UTC"), pd.Timestamp("2023-01-05", tz="UTC"), pd.Timestamp("2023-01-05", tz="UTC"), pd.Timestamp("2023-01-05", tz="UTC")]
})
historical_features = fs.get_historical_features(
entity_df=entity_df,
feature_views=[user_feature_view],
).to_df()
print("Historical features:\n", historical_features)
# --- 4. Get online features ---
# Before getting online features, you might need to materialize data
# to the online store. For 'local' provider with 'sqlite' online store,
# materialization populates the sqlite database.
fs.materialize_incremental(end_date=pd.Timestamp.now(tz="UTC"))
online_features = fs.get_online_features(
features=[ "user_profile:age", "user_profile:city" ],
entity_rows=[{"user_id": 1001}, {"user_id": 1002}]
).to_dict()
print("Online features:\n", online_features)
# --- 5. Clean up generated files (optional) ---
shutil.rmtree(repo_path)