{"id":731,"library":"xgboost","title":"XGBoost","description":"XGBoost (eXtreme Gradient Boosting) is an optimized, distributed gradient boosting library designed for efficiency, flexibility, and portability. It implements machine learning algorithms under the Gradient Boosting framework, excelling in tasks like classification, regression, and ranking. It currently stands at version 3.2.0, with frequent patch releases and major updates typically occurring every 6-12 months.","status":"active","version":"3.2.0","language":"python","source_language":"en","source_url":"https://github.com/dmlc/xgboost","tags":["machine-learning","gradient-boosting","decision-trees","classification","regression","ensemble-learning"],"install":[{"cmd":"pip install xgboost","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Essential for data handling, especially with the native API's DMatrix.","package":"numpy","optional":false},{"reason":"Used for handling sparse matrix data.","package":"scipy","optional":true},{"reason":"Commonly used for data preprocessing, model evaluation, and XGBoost offers a scikit-learn compatible API (XGBClassifier, XGBRegressor).","package":"scikit-learn","optional":true},{"reason":"Required for plotting utilities like feature importance and trees.","package":"matplotlib","optional":true},{"reason":"Required for visualizing decision trees built by XGBoost (in conjunction with matplotlib).","package":"graphviz","optional":true}],"imports":[{"symbol":"xgboost","correct":"import xgboost as xgb"},{"symbol":"XGBClassifier","correct":"from xgboost import XGBClassifier"},{"symbol":"XGBRegressor","correct":"from xgboost import XGBRegressor"},{"note":"While DMatrix can be imported directly, the conventional and often safer practice (especially with older versions or in complex setups) is to access it via the 'xgb' alias, i.e., xgb.DMatrix. Direct import can sometimes lead to issues in specific environments or when mixing with other XGBoost interfaces.","wrong":"from xgboost import DMatrix","symbol":"DMatrix","correct":"import xgboost as xgb\ndtrain = xgb.DMatrix(data, label=label)"}],"quickstart":{"code":"import xgboost as xgb\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.datasets import load_iris\nfrom sklearn.metrics import accuracy_score\n\n# 1. Load data\niris = load_iris()\nX, y = iris.data, iris.target\n\n# 2. Split data into training and testing sets\nX_train, X_test, y_train, y_test = train_test_split(\n    X, y, test_size=0.2, random_state=42\n)\n\n# 3. Initialize and train an XGBoost Classifier\nmodel = xgb.XGBClassifier(objective='multi:softprob',  # Multi-class classification with probability output\n                          num_class=len(iris.target_names),\n                          eval_metric='mlogloss',\n                          use_label_encoder=False # Suppress deprecation warning in older versions\n                         )\nmodel.fit(X_train, y_train)\n\n# 4. Make predictions\ny_pred = model.predict(X_test)\n\n# 5. Evaluate the model\naccuracy = accuracy_score(y_test, y_pred)\nprint(f\"Accuracy: {accuracy:.2f}\")\n\n# Native API example (alternative to Scikit-learn API)\ndtrain = xgb.DMatrix(X_train, label=y_train)\ndtest = xgb.DMatrix(X_test, label=y_test)\n\nparams = {\n    'objective': 'multi:softprob',\n    'num_class': len(iris.target_names),\n    'eval_metric': 'mlogloss'\n}\n\nnum_round = 100\nbst = xgb.train(params, dtrain, num_round, evals=[(dtest, 'test')])\npreds_native = bst.predict(dtest)\n# For multi:softprob, preds_native are probabilities, need argmax for class labels\npreds_class_native = [p.argmax() for p in preds_native]\naccuracy_native = accuracy_score(y_test, preds_class_native)\nprint(f\"Native API Accuracy: {accuracy_native:.2f}\")","lang":"python","description":"This quickstart demonstrates how to use XGBoost for a multi-class classification task. It shows both the scikit-learn compatible API (XGBClassifier) for ease of integration and the native XGBoost API (xgb.DMatrix, xgb.train) for more fine-grained control. The Iris dataset is used for simplicity, showcasing data loading, splitting, model training, prediction, and evaluation."},"warnings":[{"fix":"Migrate deprecated GPU-related parameters to the new `device` parameter (e.g., `device='cuda'` for GPU). Review and update code that explicitly set `tree_method` or relied on the old `predictor` parameter.","message":"In XGBoost 2.0, several parameters related to GPU usage (`gpu_id`, `gpu_hist`, `gpu_predictor`, `cpu_predictor`, `gpu_coord_descent`) were replaced by a single `device` parameter. The `hist` tree method also became the default. The `predictor` parameter was removed.","severity":"breaking","affected_versions":">=2.0.0"},{"fix":"Update calls to prediction functions to use `X` for data arguments. If your code relied on auto-generated feature names from NumPy arrays, you might need to provide them explicitly. Verify model evaluation logic, especially for binary classification with raw logistic output.","message":"XGBoost 2.0 introduced breaking changes in prediction functions, renaming all data parameters to `X` for better scikit-learn estimator interface compliance. It also dropped the generation of pseudo-feature names for `np.ndarray` inputs to `DMatrix` and changed the default evaluation metric for `binary:logitraw` to `logloss`.","severity":"breaking","affected_versions":">=2.0.0"},{"fix":"Remove usages of `DeviceQuantileDMatrix`. If you have code saving models in old formats, consider updating to current supported formats. Ensure your CUDA environment meets the minimum requirement of 12.0+ if using GPU acceleration.","message":"XGBoost 3.0 removed the deprecated `DeviceQuantileDMatrix`, dropped support for saving models in certain deprecated formats (though loading old models is still supported), and removed support for legacy (blocking) CUDA streams. It also now requires CUDA 12.0 or later.","severity":"breaking","affected_versions":">=3.0.0"},{"fix":"Remove `use_label_encoder=True/False` from `XGBClassifier` or `XGBRegressor` initializations. If you encounter warnings in older code, setting `use_label_encoder=False` was a common workaround.","message":"The `use_label_encoder` parameter in `XGBClassifier` and `XGBRegressor` was deprecated and is now effectively ignored (or will raise warnings/errors in older versions). Modern XGBoost handles label encoding internally.","severity":"deprecated","affected_versions":">=1.6.0 (deprecation), >=2.0.0 (removed effectively)"},{"fix":"Always perform hyperparameter tuning using techniques like Grid Search, Random Search, or more advanced optimization frameworks (e.g., Optuna) tailored to your dataset's characteristics. Focus on `learning_rate`, `max_depth`, `min_child_weight`, `subsample`, `colsample_bytree`, and `gamma`.","message":"Blindly using default hyperparameters without tuning for your specific dataset is a common mistake, leading to sub-optimal model performance or overfitting/underfitting. XGBoost offers a wide range of parameters that need careful tuning.","severity":"gotcha","affected_versions":"All versions"},{"fix":"For binary classification, use the `scale_pos_weight` parameter. For multi-class problems, consider `class_weight`. Supplement these with sampling techniques like SMOTE if necessary.","message":"Ignoring class imbalance in your dataset can lead to models that perform poorly on the minority class, which is often the class of most interest. XGBoost provides mechanisms to address this.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure all necessary external libraries, such as scikit-learn, are installed in your Python environment. You can typically install scikit-learn using `pip install scikit-learn` or by including it in your project's `requirements.txt` file.","message":"When using XGBoost with common machine learning workflows, libraries like scikit-learn are frequently used for tasks such as data splitting (`train_test_split`), preprocessing, and evaluation metrics. A `ModuleNotFoundError` for such libraries indicates that they are not installed in the current Python environment.","severity":"gotcha","affected_versions":"All versions (dependency issue)"},{"fix":"Ensure 'cmake' and a C++ compiler (e.g., g++ or clang) are installed on your system before attempting to install XGBoost. For Alpine Linux, this typically involves `apk add cmake g++`.","message":"Installing XGBoost on certain Linux distributions, especially minimal ones like Alpine, requires build tools like 'cmake' and a C++ compiler. Without these, the installation process will fail with a 'FileNotFoundError' for 'cmake' or similar compilation errors.","severity":"gotcha","affected_versions":"All versions (when installing from source or wheels that require compilation)"}],"env_vars":null,"last_verified":"2026-05-12T18:22:09.686Z","next_check":"2026-06-26T00:00:00.000Z","problems":[{"fix":"pip install xgboost","cause":"The `xgboost` library is not installed in the current Python environment.","error":"ModuleNotFoundError: No module named 'xgboost'"},{"fix":"import xgboost as xgb\nimport numpy as np\n\n# Assuming X and y are NumPy arrays or Pandas DataFrames\ndmatrix_data = xgb.DMatrix(X, label=y)\n# Use dmatrix_data for xgb.train or other DMatrix-requiring functions","cause":"The `xgboost.train` function or some core XGBoost C API methods require input data to be an `xgboost.DMatrix` object, but a raw data structure like a NumPy array or Pandas DataFrame was provided.","error":"ValueError: Invalid DMatrix: xgboost.DMatrix is required for input data."},{"fix":"import pandas as pd\n\n# Assuming X_train was a DataFrame used for model training\n# Ensure X_predict has the same columns in the same order as X_train\nX_predict = pd.DataFrame(some_new_data, columns=X_train.columns)\nmodel.predict(X_predict)","cause":"The feature names or their order in the input data provided for prediction or evaluation do not match those used when the XGBoost model was trained.","error":"ValueError: feature_names mismatch"},{"fix":"# Check the official XGBoost documentation for available parameters\n# Correct a common typo (e.g., 'sub_sample' instead of 'subsample'):\n# Incorrect:\n# model = xgb.XGBClassifier(sub_sample=0.8)\n# Correct:\nmodel = xgb.XGBClassifier(subsample=0.8)\n\n# Note: Parameters for xgb.train are passed in a dictionary, while for XGBClassifier/XGBRegressor they are keyword arguments.","cause":"An unrecognized or misspelled parameter was passed to an XGBoost model constructor (e.g., `XGBClassifier`) or the `xgb.train` function.","error":"XGBoostError: Unknown parameter: <parameter_name>"}],"ecosystem":"pypi","meta_description":null,"install_score":50,"install_tag":"draft","quickstart_score":0,"quickstart_tag":"stale","pypi_latest":"3.2.0","install_checks":{"last_tested":"2026-05-12","tag":"draft","tag_description":"notable install failures or slow imports","results":[{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":1,"wheel_type":null,"failure_reason":"build_error","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":17.7,"import_time_s":0.95,"mem_mb":22.9,"disk_size":"843M"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":0.71,"mem_mb":22.9,"disk_size":"833M"},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":1,"wheel_type":null,"failure_reason":"build_error","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":19.1,"import_time_s":1.42,"mem_mb":25.4,"disk_size":"857M"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":1.16,"mem_mb":25.4,"disk_size":"846M"},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":1,"wheel_type":null,"failure_reason":"build_error","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":18.5,"import_time_s":1.39,"mem_mb":25.3,"disk_size":"843M"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":1.25,"mem_mb":25.3,"disk_size":"832M"},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":1,"wheel_type":null,"failure_reason":"build_error","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":16.6,"import_time_s":1.29,"mem_mb":24.8,"disk_size":"842M"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":1.23,"mem_mb":24.8,"disk_size":"831M"},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":1,"wheel_type":null,"failure_reason":"build_error","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":20.6,"import_time_s":0.81,"mem_mb":19.8,"disk_size":"955M"},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":0.69,"mem_mb":19.8,"disk_size":"944M"}]},"quickstart_checks":{"last_tested":"2026-04-24","tag":"stale","tag_description":"widespread failures or data too old to trust","results":[{"runtime":"python:3.10-alpine","exit_code":1},{"runtime":"python:3.10-slim","exit_code":-1},{"runtime":"python:3.11-alpine","exit_code":1},{"runtime":"python:3.11-slim","exit_code":-1},{"runtime":"python:3.12-alpine","exit_code":1},{"runtime":"python:3.12-slim","exit_code":-1},{"runtime":"python:3.13-alpine","exit_code":1},{"runtime":"python:3.13-slim","exit_code":-1},{"runtime":"python:3.9-alpine","exit_code":1},{"runtime":"python:3.9-slim","exit_code":-1}]}}