YDF (Yggdrasil Decision Forests)

0.16.1 · active · verified Thu Apr 16

YDF (short for Yggdrasil Decision Forests) is a library for training, serving, evaluating, and analyzing decision forest models such as Random Forest and Gradient Boosted Trees. It acts as a lightweight, efficient wrapper around the C++ Yggdrasil Decision Forests library. YDF is the official successor to TensorFlow Decision Forests (TF-DF) and is recommended for new projects due to its superior performance and features. It is actively developed with frequent releases.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to install YDF, import the library, load a sample dataset using Pandas, train a Gradient Boosted Trees model, evaluate its performance, make predictions, and save/load the trained model. It uses the 'Adult' dataset for a binary classification task.

import ydf
import pandas as pd
import os

# Load dataset with Pandas
ds_path = "https://raw.githubusercontent.com/google/yggdrasil-decision-forests/main/yggdrasil_decision_forests/test_data/dataset/"
try:
    train_ds = pd.read_csv(f"{ds_path}adult_train.csv")
    test_ds = pd.read_csv(f"{ds_path}adult_test.csv")
except Exception as e:
    print(f"Could not load datasets: {e}. Ensure internet connection or provide local paths.")
    exit()

# Train a Gradient Boosted Trees model
# 'label' is the target column for prediction.
# verbose=0 to suppress training logs for cleaner output, default is 1.
model = ydf.GradientBoostedTreesLearner(label="income", verbose=0).train(train_ds)

# Evaluate the model
print("Model Evaluation:")
print(model.evaluate(test_ds))

# Generate predictions
predictions = model.predict(test_ds)
print("\nFirst 5 predictions:")
print(predictions.head())

# Save and Load the model
model_path = "/tmp/my_ydf_model"
model.save(model_path)
loaded_model = ydf.load_model(model_path)
print(f"\nModel saved to '{model_path}' and reloaded successfully.")

view raw JSON →