{"id":7876,"library":"ydf","title":"YDF (Yggdrasil Decision Forests)","description":"YDF (short for Yggdrasil Decision Forests) is a library for training, serving, evaluating, and analyzing decision forest models such as Random Forest and Gradient Boosted Trees. It acts as a lightweight, efficient wrapper around the C++ Yggdrasil Decision Forests library. YDF is the official successor to TensorFlow Decision Forests (TF-DF) and is recommended for new projects due to its superior performance and features. It is actively developed with frequent releases.","status":"active","version":"0.16.1","language":"en","source_language":"en","source_url":"https://github.com/google/yggdrasil-decision-forests.git","tags":["machine learning","decision forests","random forest","gradient boosting","classification","regression","ranking","model interpretation"],"install":[{"cmd":"pip install ydf -U","lang":"bash","label":"Install latest version"}],"dependencies":[{"reason":"Required Python version.","package":"python","version":">=3.9","optional":false},{"reason":"Commonly used for data handling, especially for CSV datasets.","package":"pandas","optional":true},{"reason":"Required for integration with TensorFlow, e.g., exporting models to TensorFlow SavedModel format via ydf-tf.","package":"tensorflow","optional":true},{"reason":"Required for exporting YDF models to TensorFlow SavedModel format and loading them with TensorFlow.","package":"ydf-tf","optional":true}],"imports":[{"note":"Primary import for all YDF functionalities.","symbol":"ydf","correct":"import ydf"},{"note":"Most common classes like Learners and Models are exposed directly under the top-level 'ydf' namespace for simplicity.","wrong":"from ydf.learner import GradientBoostedTreesLearner","symbol":"GradientBoostedTreesLearner","correct":"import ydf\nmodel = ydf.GradientBoostedTreesLearner(...)"}],"quickstart":{"code":"import ydf\nimport pandas as pd\nimport os\n\n# Load dataset with Pandas\nds_path = \"https://raw.githubusercontent.com/google/yggdrasil-decision-forests/main/yggdrasil_decision_forests/test_data/dataset/\"\ntry:\n    train_ds = pd.read_csv(f\"{ds_path}adult_train.csv\")\n    test_ds = pd.read_csv(f\"{ds_path}adult_test.csv\")\nexcept Exception as e:\n    print(f\"Could not load datasets: {e}. Ensure internet connection or provide local paths.\")\n    exit()\n\n# Train a Gradient Boosted Trees model\n# 'label' is the target column for prediction.\n# verbose=0 to suppress training logs for cleaner output, default is 1.\nmodel = ydf.GradientBoostedTreesLearner(label=\"income\", verbose=0).train(train_ds)\n\n# Evaluate the model\nprint(\"Model Evaluation:\")\nprint(model.evaluate(test_ds))\n\n# Generate predictions\npredictions = model.predict(test_ds)\nprint(\"\\nFirst 5 predictions:\")\nprint(predictions.head())\n\n# Save and Load the model\nmodel_path = \"/tmp/my_ydf_model\"\nmodel.save(model_path)\nloaded_model = ydf.load_model(model_path)\nprint(f\"\\nModel saved to '{model_path}' and reloaded successfully.\")\n","lang":"python","description":"This quickstart demonstrates how to install YDF, import the library, load a sample dataset using Pandas, train a Gradient Boosted Trees model, evaluate its performance, make predictions, and save/load the trained model. It uses the 'Adult' dataset for a binary classification task."},"warnings":[{"fix":"Install `ydf-tf` (`pip install ydf-tf`) and use the recommended export methods provided by `ydf-tf` if TensorFlow SavedModel export is necessary.","message":"The method `model.to_tensorflow_saved_model(mode=\"keras\")` is strongly discouraged and will be removed in a future version. Exporting YDF models to TensorFlow SavedModel now primarily uses the separate `ydf-tf` package.","severity":"breaking","affected_versions":"0.15.0+"},{"fix":"Upgrade your Python environment to version 3.9 or higher.","message":"Support for Python 3.8 was removed, and the package moved to `manylinux_2_28`.","severity":"breaking","affected_versions":"0.14.0+"},{"fix":"For reproducible results, ensure your input dataset (columns, order, values) and YDF version are strictly identical across training runs, and explicitly set a random seed if available for stochastic parts of the algorithm.","message":"Adding new columns, reordering existing columns, or slight changes in input data can lead to different model outcomes due to the stochastic nature of some training components (e.g., feature sampling) and the pseudo-random number generator's initialization. YDF training is deterministic given identical inputs and version.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Set `verbose=0` to suppress all logs or `verbose=2` to print all logs on all surfaces (e.g., notebook and console) if detailed debugging is needed.","message":"The `verbose` parameter in learners (e.g., `GradientBoostedTreesLearner`) controls the amount of logging. The default (`verbose=1`) might produce extensive output in notebooks or consoles, potentially obscuring important information.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Update your code to use `LAMBDA_MART_NDCG` for ranking tasks. The old name is deprecated but still functions.","message":"The loss metric `LAMBDA_MART_NDCG5` has been renamed to `LAMBDA_MART_NDCG`.","severity":"deprecated","affected_versions":"0.11.0+"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Run `pip install ydf` to install the library.","cause":"The YDF library has not been installed in your current Python environment.","error":"ModuleNotFoundError: No module named 'ydf'"},{"fix":"Review your dataset for column data types and ensure they are appropriate for the features you intend to use. Verify that the `label` parameter in your learner (e.g., `ydf.GradientBoostedTreesLearner(label=\"your_target_column\")`) correctly points to an existing column with suitable data for the task (classification, regression, etc.). YDF handles missing values automatically, but specific preprocessing might still be beneficial for certain data types or tasks.","cause":"The input data (e.g., a Pandas DataFrame or CSV) contains unexpected data types in columns that YDF cannot automatically interpret or use for training the specified task, or the 'label' column is missing/incorrectly specified.","error":"TypeError: 'str' object cannot be interpreted as an integer (or similar data-related TypeErrors/ValueErrors during training)"},{"fix":"Install the `ydf-tf` package (`pip install ydf-tf`) and consult the latest YDF documentation for the correct way to export models to TensorFlow SavedModel format using the `ydf-tf` integration. The `mode=\"keras\"` option for direct export is deprecated.","cause":"You are trying to use an outdated or incorrect method to export a YDF model to TensorFlow SavedModel format, likely from an older TF-DF pattern, without the `ydf-tf` package installed or correctly imported.","error":"AttributeError: 'Model' object has no attribute 'to_tensorflow_saved_model'"},{"fix":"Ensure your training dataset contains columns other than the label that YDF can use as features. If you are explicitly defining features, double-check your `features` argument to the learner. YDF usually auto-detects features, so this often indicates an empty feature set after exclusions.","cause":"No valid input features were provided or automatically detected for training after excluding the label column and any manually excluded features.","error":"RuntimeError: Learner 'GRADIENT_BOOSTED_TREES' requires at least one feature. Check 'exclude_non_specified_features' and 'features' arguments."}]}