{"id":15061,"library":"xgboost-cpu","title":"XGBoost (CPU Version)","description":"XGBoost is an optimized distributed gradient boosting library designed for speed and performance. The `xgboost-cpu` package serves as a convenience installer, providing the core XGBoost library (version 2.0.3 as of `xgboost-cpu==3.2.0`) compiled with CPU-only optimizations. This meta-package, currently at version 3.2.0, facilitates specific build installations and is updated alongside major XGBoost releases.","status":"active","version":"3.2.0","language":"en","source_language":"en","source_url":"https://github.com/dmlc/xgboost","tags":["machine-learning","gradient-boosting","model-training","classification","regression","cpu-only"],"install":[{"cmd":"pip install xgboost-cpu","lang":"bash","label":"Install CPU-optimized XGBoost"}],"dependencies":[{"reason":"This meta-package installs the core XGBoost library at a specific version.","package":"xgboost==2.0.3","optional":false},{"reason":"Required Python version.","package":"python","optional":false}],"imports":[{"symbol":"XGBClassifier","correct":"from xgboost import XGBClassifier"},{"symbol":"XGBRegressor","correct":"from xgboost import XGBRegressor"},{"symbol":"DMatrix","correct":"from xgboost import DMatrix"},{"symbol":"train","correct":"from xgboost import train"}],"quickstart":{"code":"import xgboost as xgb\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.metrics import accuracy_score\nfrom sklearn.datasets import make_classification\nimport os\n\n# 1. Generate synthetic data for a classification task\nX, y = make_classification(n_samples=1000, n_features=20, n_informative=10, n_redundant=5, random_state=42)\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n\n# 2. Initialize the XGBoost classifier\n# 'objective' specifies the learning task.\n# 'eval_metric' defines the metric for evaluation during training.\n# `use_label_encoder=False` is recommended for XGBoost 1.x and 2.x to avoid a deprecation warning.\n# `n_jobs` can be set to -1 to use all available CPU cores, or a specific number.\nmodel = xgb.XGBClassifier(\n    objective='binary:logistic',\n    eval_metric='logloss',\n    n_estimators=100,\n    learning_rate=0.1,\n    max_depth=5,\n    use_label_encoder=False, # Required for older versions to silence warning\n    n_jobs=int(os.environ.get('XGB_N_JOBS', '-1')), # Example of using env var for N_JOBS\n    random_state=42\n)\n\n# 3. Train the model\nmodel.fit(X_train, y_train)\n\n# 4. Make predictions on the test set\ny_pred = model.predict(X_test)\n\n# 5. Evaluate the model's performance\naccuracy = accuracy_score(y_test, y_pred)\nprint(f\"Model Accuracy: {accuracy:.4f}\")","lang":"python","description":"This quickstart demonstrates how to train a basic XGBoost classifier for a binary classification task using synthetic data. It showcases the common `XGBClassifier` API, similar to scikit-learn models."},"warnings":[{"fix":"Remove `output_margin` from prediction calls. Consult documentation for equivalent functionality if needed (e.g., `predict(output_margin=True)` for raw scores).","message":"The `output_margin` parameter for prediction was removed in XGBoost 2.0. Use `pred_contribs=True` or other specific output types if needed.","severity":"breaking","affected_versions":">=2.0"},{"fix":"Retrain models with XGBoost 2.x if possible. For loading older models, consider converting them using utilities (if available) or retraining on the new version. Always save models using `model.save_model()` or `joblib.dump()` for compatibility.","message":"XGBoost 2.0 introduced significant changes to the model save/load format (`XGBoost.json` vs. older binary format) and the Python package structure. Models saved with older versions might not be directly loadable or usable with XGBoost 2.x without conversion.","severity":"breaking","affected_versions":">=2.0"},{"fix":"Run `import xgboost; print(xgboost.__version__)` to verify the actual core library version installed. Be aware of documentation and API differences between the meta-package version and the core library version.","message":"The `xgboost-cpu` package version (e.g., 3.2.0) refers to a distribution/installer version, not the core `xgboost` library version it installs. As of `xgboost-cpu==3.2.0`, it installs `xgboost==2.0.3`. Always check `xgboost.__version__` after installation to confirm the underlying library version.","severity":"gotcha","affected_versions":"All versions of `xgboost-cpu` and `xgboost-gpu` meta-packages"},{"fix":"Replace `gpu_id=0` with `device='cuda:0'` or similar in your model parameters. For CPU-only, `device='cpu'` can be explicitly set.","message":"The `gpu_id` parameter for specifying GPU devices has been deprecated and replaced by the more general `device` parameter (e.g., `device='cuda:0'`, `device='cpu'`).","severity":"deprecated","affected_versions":"1.x, leading up to 2.x"},{"fix":"Convert your data to `DMatrix` objects before passing to `xgboost.train` or `xgboost.cv`: `dtrain = xgb.DMatrix(X_train, label=y_train)`.","message":"For optimal performance and memory efficiency with very large datasets, especially when using the raw C++ API via `xgboost.train` or `xgboost.cv` directly, it is highly recommended to use `xgboost.DMatrix` objects instead of raw NumPy arrays or Pandas DataFrames.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Ensure you have installed the correct package using `pip install xgboost-cpu`. If using `conda`, use `conda install -c conda-forge xgboost`.","cause":"The `xgboost` library is not installed in the currently active Python environment, or the environment is not correctly activated.","error":"ModuleNotFoundError: No module named 'xgboost'"},{"fix":"Verify that the feature names and their order in your prediction data match those used during training. Use `model.get_booster().feature_names` to inspect the expected names. If using `DMatrix`, ensure `feature_names` are passed correctly during its creation.","cause":"This commonly occurs when loading a pre-trained model and attempting to predict with a DataFrame that has different feature names or column order than the data used for training the model.","error":"ValueError: feature_names mismatch"},{"fix":"Check your input data (features and labels) to ensure it is not empty, does not contain entirely null values, and has valid dimensions before passing it to `DMatrix` or the model's `fit` method.","cause":"Attempting to create a `DMatrix` or train an XGBoost model with an empty dataset (e.g., input data with 0 rows or 0 columns, or all NaN values).","error":"XGBoostError: DMatrix is not allowed to be empty"},{"fix":"Convert the parameter value to an integer type. For example, use `n_jobs=int(os.environ.get('XGB_N_JOBS', '-1'))` or ensure direct assignments are `n_estimators=100` instead of `n_estimators='100'`.","cause":"A parameter that expects an integer value (e.g., `n_jobs`, `num_boost_round`, `random_state`) was passed a string. This can happen when reading configuration from environment variables or text files without proper type conversion.","error":"TypeError: 'str' object cannot be interpreted as an integer"}],"ecosystem":"pypi"}