{"library":"lightgbm","title":"LightGBM","description":"LightGBM (Light Gradient Boosting Machine) is an open-source, high-performance gradient boosting framework developed by Microsoft. It uses tree-based learning algorithms and is designed for efficiency, scalability, and high accuracy, particularly with large datasets. Key innovations like Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB) contribute to its faster training speeds and lower memory usage. The library is actively maintained, with frequent releases, and is currently at version 4.6.0.","status":"active","version":"4.6.0","language":"en","source_language":"en","source_url":"https://github.com/microsoft/LightGBM.git","tags":["machine-learning","gradient-boosting","decision-trees","xgboost-alternative","tabular-data","microsoft"],"install":[{"cmd":"pip install lightgbm","lang":"bash","label":"Basic Installation"},{"cmd":"pip install 'lightgbm[pandas,scikit-learn,dask]'","lang":"bash","label":"With Common Integrations"},{"cmd":"brew install libomp # For macOS users needing OpenMP","lang":"bash","label":"macOS OpenMP Dependency"}],"dependencies":[{"reason":"Commonly used for data handling and is often a prerequisite for data science workflows.","package":"numpy","optional":false},{"reason":"Integration with pandas DataFrames is a common use case; installable via 'lightgbm[pandas]'.","package":"pandas","optional":true},{"reason":"Provides a scikit-learn compatible API (LGBMClassifier, LGBMRegressor); installable via 'lightgbm[scikit-learn]'.","package":"scikit-learn","optional":true},{"reason":"For distributed training capabilities; installable via 'lightgbm[dask]'.","package":"dask","optional":true},{"reason":"Required for GPU support on Windows and Linux; often included with NVIDIA/AMD drivers.","package":"OpenCL Runtime libraries","optional":true},{"reason":"Required on macOS for OpenMP support and multithreading functionality.","package":"libomp","optional":true},{"reason":"Necessary for building LightGBM from source or for some advanced configurations, though wheels usually include a pre-compiled library.","package":"C++ Compiler (GCC/Clang on Linux/macOS, MSVC on Windows)","optional":true}],"imports":[{"symbol":"lightgbm","correct":"import lightgbm as lgb"},{"symbol":"LGBMClassifier","correct":"from lightgbm import LGBMClassifier"},{"symbol":"LGBMRegressor","correct":"from lightgbm import LGBMRegressor"},{"note":"As of v4.0.0, 'feature_name' and 'categorical_feature' parameters should be set directly on the `Dataset` object or inferred, not passed to the constructor or `train`/`cv` functions.","wrong":"lgb.Dataset(X_train, y_train, feature_name=feature_names, categorical_feature=categorical_features)","symbol":"Dataset","correct":"lgb_train = lgb.Dataset(X_train, y_train)"}],"quickstart":{"code":"import numpy as np\nimport lightgbm as lgb\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.metrics import accuracy_score\n\n# Generate some dummy data\nX = np.random.rand(1000, 10)\ny = np.random.randint(0, 2, 1000)\n\n# Split data into training and testing sets\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n\n# Initialize and train the LGBMClassifier\n# Using scikit-learn API for convenience\nmodel = lgb.LGBMClassifier(objective='binary', random_state=42)\nmodel.fit(X_train, y_train,\n          eval_set=[(X_test, y_test)],\n          callbacks=[lgb.early_stopping(10)]) # Early stopping after 10 rounds without improvement\n\n# Make predictions\ny_pred = model.predict(X_test)\n\n# Evaluate the model\naccuracy = accuracy_score(y_test, y_pred)\nprint(f\"Model Accuracy: {accuracy:.4f}\")","lang":"python","description":"This quickstart demonstrates how to train a binary classification model using LightGBM's scikit-learn compatible API (`LGBMClassifier`). It covers data preparation, model initialization, training with early stopping, prediction, and evaluation. For non-scikit-learn API, `lgb.Dataset` and `lgb.train` are used."},"warnings":[{"fix":"Review release notes for v4.0.0. Update code to use `lgb.Dataset.set_feature_name()` and `lgb.Dataset.set_categorical_feature()` or ensure features are correctly typed/named. Ensure `scikit-learn>=0.24.2` is installed for `LGBMClassifier/Regressor`.","message":"LightGBM v4.x introduced significant breaking changes. Key updates include making `Booster` and `Dataset` `handle` attributes private, removal of a hard `scikit-learn` dependency (now optional), and switching to PEP 517/518 builds (removal of `setup.py`). Furthermore, `feature_name` and `categorical_feature` parameters should now be set on the `lgb.Dataset` object directly, not passed to `train()` or `cv()` functions. CUDA 10 support was dropped in favor of CUDA 12.","severity":"breaking","affected_versions":"4.0.0 and above"},{"fix":"Ensure all categorical features are explicitly converted to `int` type (e.g., using `LabelEncoder` or pandas `astype('category').cat.codes`) before creating `lgb.Dataset` or fitting `LGBMClassifier/Regressor`. Values should ideally range from 0 to `num_categories - 1`.","message":"LightGBM can handle categorical features natively, but they should be converted to integer types (e.g., 0, 1, 2...). Passing non-integer or excessively large integer values as categorical features can lead to warnings or unexpected behavior.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Implement hyperparameter tuning for `num_leaves`, `max_depth`, `min_data_in_leaf`, and regularization parameters (`lambda_l1`, `lambda_l2`). Always use early stopping with a validation set during training. Consider `min_sum_hessian_in_leaf` and data/feature bagging (`bagging_fraction`, `feature_fraction`).","message":"LightGBM is prone to overfitting, especially on small datasets (<10,000 records) or with excessively deep trees.","severity":"gotcha","affected_versions":"All versions"},{"fix":"For basic GPU usage, ensure OpenCL Runtime libraries are installed (often via GPU drivers). For specific CUDA versions or advanced GPU features, consult the official LightGBM installation guide for building from source or specialized wheels. Use `device='gpu'` in parameters.","message":"Using GPU acceleration requires specific setup beyond `pip install lightgbm`. While newer versions (v4.x) have improved CUDA support, you typically need OpenCL Runtime libraries. Some advanced GPU features or specific CUDA versions might require building from source.","severity":"gotcha","affected_versions":"All versions, especially 3.x and earlier for CUDA compatibility"},{"fix":"Set `nthreads=1` in your LightGBM parameters to disable LightGBM's internal multithreading when using forking mechanisms.","message":"On Linux, if LightGBM hangs when multithreading (OpenMP) and using forking (e.g., in multiprocessing scenarios), it's a known bug.","severity":"gotcha","affected_versions":"All versions (Linux)"}],"env_vars":null,"last_verified":"2026-04-05T00:00:00.000Z","next_check":"2026-07-04T00:00:00.000Z"}