{"id":7700,"library":"sb3-contrib","title":"Stable Baselines3 Contrib","description":"sb3-contrib is the experimental contribution package for Stable Baselines3, providing additional reinforcement learning algorithms and features not yet integrated into the main SB3 library. It is currently at version 2.8.0 and typically releases new versions in sync with Stable Baselines3's major and minor updates, often introducing breaking changes related to Python or SB3 dependency versions.","status":"active","version":"2.8.0","language":"en","source_language":"en","source_url":"https://github.com/Stable-Baselines-Team/stable-baselines3-contrib","tags":["reinforcement learning","deep learning","AI","algorithms","stable-baselines3"],"install":[{"cmd":"pip install sb3-contrib stable-baselines3 gymnasium","lang":"bash","label":"Recommended Installation"}],"dependencies":[{"reason":"Core dependency; sb3-contrib algorithms extend stable-baselines3.","package":"stable-baselines3","optional":false},{"reason":"Standard environment interface for training RL agents.","package":"gymnasium","optional":true},{"reason":"Underlying deep learning framework used by Stable Baselines3.","package":"torch","optional":false}],"imports":[{"symbol":"MaskablePPO","correct":"from sb3_contrib import MaskablePPO"},{"symbol":"RecurrentPPO","correct":"from sb3_contrib import RecurrentPPO"},{"symbol":"ARS","correct":"from sb3_contrib import ARS"},{"symbol":"TRPO","correct":"from sb3_contrib import TRPO"},{"symbol":"CrossQ","correct":"from sb3_contrib import CrossQ"},{"note":"QRDQN was moved from sb3-contrib to stable_baselines3 core in SB3 v2.0.","wrong":"from sb3_contrib import QRDQN","symbol":"QRDQN","correct":"from stable_baselines3 import QRDQN"}],"quickstart":{"code":"import gymnasium as gym\nfrom sb3_contrib import ARS\nfrom stable_baselines3.common.env_util import make_vec_env\nfrom stable_baselines3.common.vec_env import VecMonitor\n\n# 1. Create a vectorized environment\nenv_id = \"CartPole-v1\"\nvec_env = make_vec_env(env_id, n_envs=4, seed=0)\nvec_env = VecMonitor(vec_env) # Recommended wrapper for logging\n\n# 2. Initialize the ARS agent\n# ARS (Augmented Random Search) is a policy-gradient-free algorithm\nmodel = ARS(\"MlpPolicy\", vec_env, verbose=1)\n\n# 3. Train the agent\nprint(\"Training the ARS model...\")\nmodel.learn(total_timesteps=10000)\nprint(\"Training finished.\")\n\n# 4. Save and load the model (optional)\nmodel.save(\"ars_cartpole\")\ndel model # remove to demonstrate loading\nmodel = ARS.load(\"ars_cartpole\")\n\n# 5. Evaluate the trained agent\nprint(\"Evaluating the trained model...\")\nobs, info = vec_env.reset()\nfor _ in range(100): # Run for 100 steps\n    action, _states = model.predict(obs, deterministic=True)\n    obs, rewards, dones, infos = vec_env.step(action)\n    # Handle episode termination for vectorized environments\n    for i, done in enumerate(dones):\n        if done:\n            print(f\"Episode finished, reward: {infos[i]['episode']['r']:.2f}\")\nvec_env.close()","lang":"python","description":"This quickstart demonstrates how to set up a vectorized Gymnasium environment and train an ARS (Augmented Random Search) agent from sb3-contrib. It covers environment creation, model initialization, training, and basic evaluation."},"warnings":[{"fix":"Upgrade your Python environment to 3.10 or later. For example, use pyenv or update your Conda environment.","message":"Python 3.9 support was removed in v2.8.0. Earlier versions (v2.5.0, v2.1.0) dropped support for Python 3.8 and 3.7 respectively. Ensure your Python version meets the minimum requirement (>=3.10 for v2.8.0).","severity":"breaking","affected_versions":">=2.1.0"},{"fix":"Always install/upgrade both packages together: `pip install --upgrade stable-baselines3 sb3-contrib`.","message":"sb3-contrib is tightly coupled with `stable-baselines3`. New versions of `sb3-contrib` frequently require specific, often newer, versions of `stable-baselines3` (e.g., v2.8.0 requires SB3 >= 2.8.0).","severity":"breaking","affected_versions":"All versions"},{"fix":"Import `QRDQN` from `stable_baselines3`: `from stable_baselines3 import QRDQN`.","message":"The `QRDQN` algorithm was originally in `sb3-contrib` but was moved to the core `stable_baselines3` library starting with SB3 v2.0. Attempting to import it from `sb3_contrib` will result in an `ImportError`.","severity":"gotcha","affected_versions":"stable-baselines3 >= 2.0.0"},{"fix":"Ensure your custom environment or a wrapper implements `action_masks()`. When using `model.predict()`, pass the masks explicitly: `model.predict(obs, action_masks=env.action_masks())`.","message":"Algorithms like `MaskablePPO` and `RecurrentPPO` require the environment to implement an `action_masks()` method, which returns a boolean numpy array indicating valid actions. This is not a standard `gymnasium.Env` or `VecEnv` feature.","severity":"gotcha","affected_versions":"All versions"},{"fix":"If migrating from older versions, explicitly set `learning_starts` to 50_000 for `QRDQN` if you want the old behavior, or adapt your hyperparameter tuning to the new default.","message":"The default `learning_starts` parameter for `QRDQN` was significantly changed in `sb3-contrib` v2.3.0 (from 50_000 to 100) to align with other off-policy algorithms. This can drastically alter training behavior if not explicitly set.","severity":"breaking","affected_versions":">=2.3.0"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Change your import statement to `from stable_baselines3 import QRDQN`.","cause":"`QRDQN` was moved from `sb3-contrib` to `stable_baselines3`'s core library in SB3 v2.0.","error":"ImportError: cannot import name 'QRDQN' from 'sb3_contrib'"},{"fix":"Implement the `action_masks()` method in your custom `gymnasium.Env` or a custom `gymnasium.Wrapper` around your environment. If using a `VecEnv`, ensure the underlying environments provide masks and they are passed correctly, for `model.predict` you'll likely need to pass `action_masks` explicitly.","cause":"When using `MaskablePPO` or `RecurrentPPO`, the environment (or its wrapper chain) must implement an `action_masks()` method to provide valid action masks.","error":"AttributeError: 'VecMonitor' object has no attribute 'action_masks'"},{"fix":"Upgrade both packages to their latest compatible versions: `pip install --upgrade stable-baselines3 sb3-contrib`.","cause":"`sb3-contrib` requires a specific, often very recent, version of `stable-baselines3` for compatibility.","error":"stable_baselines3.common.utils.SB3DeprecationWarning: You are using an outdated version of Stable-Baselines3"},{"fix":"Review the definitions of your environment's observation and action spaces, and any custom data structures passed into the model, to ensure all sequences that are zipped together are of consistent and expected lengths.","cause":"From `sb3-contrib` v2.8.0, `zip` calls were updated to use `strict=True`, which will raise an error if sequences being zipped have different lengths. This often points to inconsistencies in environment observation/action spaces or custom data.","error":"TypeError: zip() argument 'strict' must be bool, not None"}]}