{"id":4397,"library":"stable-baselines3","title":"Stable Baselines3","description":"Stable Baselines3 (SB3) is a comprehensive Python library offering reliable implementations of reinforcement learning (RL) algorithms in PyTorch. It provides a clean and simple API, adhering to a scikit-learn-like syntax for training, evaluating, and deploying RL agents. SB3 is actively maintained with frequent releases, supporting state-of-the-art model-free RL algorithms like A2C, PPO, SAC, DQN, and TD3.","status":"active","version":"2.8.0","language":"en","source_language":"en","source_url":"https://github.com/DLR-RM/stable-baselines3","tags":["reinforcement-learning","pytorch","deep-learning","ai","gymnasium"],"install":[{"cmd":"pip install stable-baselines3 gymnasium","lang":"bash","label":"Basic Installation"},{"cmd":"pip install stable-baselines3[extra] gymnasium","lang":"bash","label":"Installation with extras (e.g., rendering)"}],"dependencies":[{"reason":"Stable Baselines3 v2.8.0 requires Python 3.10 or newer. Support for Python 3.9 was dropped in v2.8.0, and 3.8 in v2.5.0.","package":"python","optional":false},{"reason":"The library is built on PyTorch. A minimum version of 2.3.0 is required since SB3 v2.5.0.","package":"torch","optional":false},{"reason":"Gymnasium is the primary environment backend since SB3 v2.0.0. While older gym versions might be compatible via `shimmy`, direct use of Gymnasium is recommended.","package":"gymnasium","optional":false}],"imports":[{"symbol":"PPO","correct":"from stable_baselines3 import PPO"},{"symbol":"A2C","correct":"from stable_baselines3 import A2C"},{"symbol":"SAC","correct":"from stable_baselines3 import SAC"},{"symbol":"DQN","correct":"from stable_baselines3 import DQN"},{"symbol":"make_vec_env","correct":"from stable_baselines3.common.env_util import make_vec_env"},{"symbol":"evaluate_policy","correct":"from stable_baselines3.common.evaluation import evaluate_policy"}],"quickstart":{"code":"import gymnasium as gym\nfrom stable_baselines3 import A2C\n\n# Create environment\nenv = gym.make(\"CartPole-v1\")\n\n# Instantiate the agent\nmodel = A2C(\"MlpPolicy\", env, verbose=1)\n\n# Train the agent\nmodel.learn(total_timesteps=10000)\n\n# Save the model\nmodel.save(\"a2c_cartpole\")\n\n# Delete model and reload it to demonstrate saving and loading\ndel model\nmodel = A2C.load(\"a2c_cartpole\")\n\n# Evaluate the trained agent\nobs, info = env.reset()\nfor i in range(1000):\n    action, _states = model.predict(obs, deterministic=True)\n    obs, reward, terminated, truncated, info = env.step(action)\n    if terminated or truncated:\n        obs, info = env.reset()\nenv.close()\n","lang":"python","description":"This quickstart demonstrates how to create a Gymnasium environment, instantiate an A2C agent, train it for a specified number of timesteps, save and load the trained model, and finally evaluate its performance."},"warnings":[{"fix":"Upgrade your Python environment to version 3.10 or higher.","message":"Dropped Python 3.9 support in v2.8.0. Users on Python 3.9 must upgrade to Python >= 3.10. Similarly, Python 3.8 support was removed in v2.5.0/v2.4.0.","severity":"breaking","affected_versions":">=2.8.0 (for Python 3.9), >=2.5.0 (for Python 3.8)"},{"fix":"Upgrade PyTorch to version 2.3.0 or newer (e.g., `pip install torch>=2.3.0`).","message":"The minimum required PyTorch version increased to 2.3.0 in Stable Baselines3 v2.5.0. Ensure your PyTorch installation meets this requirement.","severity":"breaking","affected_versions":">=2.5.0"},{"fix":"Replace `import gym` with `import gymnasium as gym` and update environment creation where necessary. Install `shimmy` if you need to wrap legacy Gym environments.","message":"Stable Baselines3 switched to Gymnasium as its primary environment backend starting from v2.0.0. While compatibility layers exist via `shimmy` for older `gym` environments, direct migration to Gymnasium is highly recommended.","severity":"breaking","affected_versions":">=2.0.0"},{"fix":"Upgrade Stable Baselines3 to v2.3.2 or newer if you encounter loading issues with PyTorch 1.13.","message":"Stable Baselines3 v2.3.0 introduced a breaking change where `torch.load()` was called with `weights_only=True`, causing issues when loading policies trained with PyTorch 1.13. This was reverted in v2.3.2.","severity":"breaking","affected_versions":"2.3.0, 2.3.1"},{"fix":"Ensure that any iterables passed to internal `zip` operations (or `zip` calls within custom code interacting with SB3) have consistent lengths.","message":"Starting from v2.8.0, `strict=True` is now set for every call to `zip(...)` internally, which can raise `ValueError` if iterables have different lengths. This change also applies to `sb3_contrib` (v2.6.0).","severity":"breaking","affected_versions":">=2.8.0 (SB3), >=2.6.0 (SB3-Contrib)"},{"fix":"Review custom callbacks to explicitly return `True` or `False` to control training flow.","message":"When using custom callbacks, ensure they return a boolean (`True` to continue, `False` to stop training). Returning `None` will be interpreted as `False` and abruptly stop training since `stable-baselines3-contrib` v2.6.0 (which impacts SB3).","severity":"gotcha","affected_versions":">=2.6.0 (SB3-Contrib, affecting SB3 users of custom callbacks)"},{"fix":"Wrap your environment with `Monitor` early in the environment stacking process: `env = Monitor(env)`.","message":"For accurate evaluation results, especially when other wrappers modify rewards or episode lengths (e.g., reward scaling), it is recommended to wrap your environment with the `Monitor` wrapper before any other wrappers.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-12T00:00:00.000Z","next_check":"2026-07-11T00:00:00.000Z"}