Feast Feature Store

0.62.0 · active · verified Sun Apr 12

Feast is an open-source feature store that enables data scientists and engineers to productionize machine learning features. It provides a consistent way to define, manage, and serve features for both model training (historical data) and online inference (low-latency serving). Feast is actively maintained, with new releases typically occurring monthly.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to define entities and feature views, set up a local Feast repository programmatically, and then retrieve both historical and online features. In a typical Feast workflow, `feature_store.yaml` and feature definitions (`feature_repo.py`) are managed as files in a `feature_repo` directory, and the `feast apply` CLI command is used to register them. This example simulates the necessary file structure and programmatic application for a runnable Python script, followed by cleanup.

import pandas as pd
import os
import shutil
from feast import FeatureStore, Entity, FeatureView, Field, ValueType
from feast.infra.offline_stores.file_source import FileSource

# --- 1. Define feature repository structure and data ---

# Create a dummy feature_repo directory for the quickstart
repo_path = "feature_repo"
if not os.path.exists(repo_path): os.makedirs(repo_path)

# Create a dummy feature_store.yaml inside the repo_path
with open(os.path.join(repo_path, "feature_store.yaml"), "w") as f:
    f.write("project: default_project\n")
    f.write("provider: local\n")
    f.write("registry: data/registry.db\n")
    f.write("online_store:\n")
    f.write("    type: sqlite\n")
    f.write("    path: data/online_store.db\n")
    f.write("offline_store:\n")
    f.write("    type: local\n")

# Create dummy data for our feature view
user_df = pd.DataFrame({
    "user_id": [1001, 1002, 1003, 1004],
    "age": [25, 30, 22, 35],
    "city": ["NYC", "SF", "LA", "Chicago"],
    "event_timestamp": [pd.Timestamp("2023-01-01", tz="UTC"), pd.Timestamp("2023-01-02", tz="UTC"), pd.Timestamp("2023-01-03", tz="UTC"), pd.Timestamp("2023-01-04", tz="UTC")]
})

# Simulate writing to a file source within the repo for cleanliness
user_data_path = os.path.join(repo_path, "user_data.parquet")
user_df.to_parquet(user_data_path)

# Define an Entity
user = Entity(name="user_id", description="User ID", value_type=ValueType.INT64)

# Define an Offline FileSource
user_features_source = FileSource(
    path=user_data_path,
    timestamp_field="event_timestamp"
)

# Define a FeatureView
user_feature_view = FeatureView(
    name="user_profile",
    entities=[user],
    ttl=pd.Timedelta(days=365),
    schema=[
        Field(name="age", value_type=ValueType.INT64),
        Field(name="city", value_type=ValueType.STRING),
    ],
    source=user_features_source
)

# --- 2. Initialize and apply FeatureStore ---

# Initialize FeatureStore by pointing to the repository path
fs = FeatureStore(repo_path=repo_path)

# Apply (register) the feature definitions programmatically
# This simulates `feast apply` CLI command.
fs.apply([user, user_feature_view])

# --- 3. Get historical features ---

# Create an entity_df for historical feature retrieval
entity_df = pd.DataFrame({
    "user_id": [1001, 1002, 1003, 1004],
    "event_timestamp": [pd.Timestamp("2023-01-05", tz="UTC"), pd.Timestamp("2023-01-05", tz="UTC"), pd.Timestamp("2023-01-05", tz="UTC"), pd.Timestamp("2023-01-05", tz="UTC")]
})

historical_features = fs.get_historical_features(
    entity_df=entity_df,
    feature_views=[user_feature_view],
).to_df()

print("Historical features:\n", historical_features)

# --- 4. Get online features ---

# Before getting online features, you might need to materialize data
# to the online store. For 'local' provider with 'sqlite' online store,
# materialization populates the sqlite database.
fs.materialize_incremental(end_date=pd.Timestamp.now(tz="UTC"))

online_features = fs.get_online_features(
    features=[ "user_profile:age", "user_profile:city" ],
    entity_rows=[{"user_id": 1001}, {"user_id": 1002}]
).to_dict()

print("Online features:\n", online_features)

# --- 5. Clean up generated files (optional) ---
shutil.rmtree(repo_path)

view raw JSON →