XGBoost

3.2.0 · active · verified Sat Mar 28

XGBoost (eXtreme Gradient Boosting) is an optimized, distributed gradient boosting library designed for efficiency, flexibility, and portability. It implements machine learning algorithms under the Gradient Boosting framework, excelling in tasks like classification, regression, and ranking. It currently stands at version 3.2.0, with frequent patch releases and major updates typically occurring every 6-12 months.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use XGBoost for a multi-class classification task. It shows both the scikit-learn compatible API (XGBClassifier) for ease of integration and the native XGBoost API (xgb.DMatrix, xgb.train) for more fine-grained control. The Iris dataset is used for simplicity, showcasing data loading, splitting, model training, prediction, and evaluation.

import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

# 1. Load data
iris = load_iris()
X, y = iris.data, iris.target

# 2. Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# 3. Initialize and train an XGBoost Classifier
model = xgb.XGBClassifier(objective='multi:softprob',  # Multi-class classification with probability output
                          num_class=len(iris.target_names),
                          eval_metric='mlogloss',
                          use_label_encoder=False # Suppress deprecation warning in older versions
                         )
model.fit(X_train, y_train)

# 4. Make predictions
y_pred = model.predict(X_test)

# 5. Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

# Native API example (alternative to Scikit-learn API)
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

params = {
    'objective': 'multi:softprob',
    'num_class': len(iris.target_names),
    'eval_metric': 'mlogloss'
}

num_round = 100
bst = xgb.train(params, dtrain, num_round, evals=[(dtest, 'test')])
preds_native = bst.predict(dtest)
# For multi:softprob, preds_native are probabilities, need argmax for class labels
preds_class_native = [p.argmax() for p in preds_native]
accuracy_native = accuracy_score(y_test, preds_class_native)
print(f"Native API Accuracy: {accuracy_native:.2f}")

view raw JSON →