{"id":24710,"library":"tianshou","title":"Tianshou","description":"A library for deep reinforcement learning, providing a modular and flexible framework for implementing and benchmarking RL algorithms. Current version is 2.0.1, released in 2025. The library follows a steady release cadence, with major version 2.0 overhauling the procedural API and separating algorithms from policies.","status":"active","version":"2.0.1","language":"python","source_language":"en","source_url":"https://github.com/thu-ml/tianshou","tags":["reinforcement-learning","deep-learning","pytorch","gymnasium","rl"],"install":[{"cmd":"pip install tianshou","lang":"bash","label":"Install from PyPI"}],"dependencies":[{"reason":"Environment interface","package":"gymnasium","optional":false},{"reason":"Neural network backend","package":"torch","optional":false},{"reason":"Numerical operations","package":"numpy","optional":false},{"reason":"Logging and visualization","package":"tensorboard","optional":true}],"imports":[{"note":"v2.0.0 moved Policy to a submodule.","wrong":"from tianshou import Policy","symbol":"Policy","correct":"from tianshou.policy import Policy"},{"note":"Collector was moved to tianshou.data after v1.0.0.","wrong":"from tianshou import Collector","symbol":"Collector","correct":"from tianshou.data import Collector"},{"note":"ReplayBuffer moved to tianshou.data in v2.0.0.","wrong":"from tianshou.replay import ReplayBuffer","symbol":"ReplayBuffer","correct":"from tianshou.data import ReplayBuffer"}],"quickstart":{"code":"import gymnasium as gym\nfrom tianshou.data import Collector, VectorReplayBuffer\nfrom tianshou.env import DummyVectorEnv\nfrom tianshou.policy import DQNPolicy\nfrom tianshou.trainer import offpolicy_trainer\nimport torch\n\nenv = gym.make('CartPole-v1')\ntrain_envs = DummyVectorEnv([lambda: gym.make('CartPole-v1') for _ in range(10)])\ntest_envs = DummyVectorEnv([lambda: gym.make('CartPole-v1') for _ in range(10)])\n\nstate_shape = env.observation_space.shape or env.observation_space.n\naction_shape = env.action_space.shape or env.action_space.n\n\npolicy = DQNPolicy(\n    state_shape=state_shape,\n    action_shape=action_shape,\n    model=torch.nn.Linear(state_shape, action_shape),\n    optim=torch.optim.Adam(policy_network.parameters(), lr=1e-3),\n).to('cpu')\nbuffer = VectorReplayBuffer(total_size=20000, buffer_num=len(train_envs))\ncollector = Collector(policy, train_envs, buffer)\ntest_collector = Collector(policy, test_envs)\n\ndef stop_fn(reward):\n    return reward >= 195\n\nresult = offpolicy_trainer(\n    policy=policy,\n    train_collector=collector,\n    test_collector=test_collector,\n    max_epoch=10,\n    step_per_epoch=1000,\n    step_per_collect=10,\n    episode_per_test=5,\n    batch_size=64,\n    stop_fn=stop_fn,\n    update_per_step=1,\n)\nprint(f'Finished training in {result.timing.total_time_seconds}s')","lang":"python","description":"Train a DQN agent on CartPole-v1 using off-policy training."},"warnings":[{"fix":"Update imports to use submodules (e.g., `from tianshou.policy import ...`) and adopt the new `Algorithm`/`Policy` split.","message":"v2.0.0 completely overhauled the API. `Policy` and `Algorithm` abstractions are separated; many imports changed. Old code using `from tianshou import Policy` will break.","severity":"breaking","affected_versions":"<2.0.0"},{"fix":"Use `from tianshou.data import VectorReplayBuffer` and ensure buffer API calls are updated.","message":"In v2.0.0, `ReplayBuffer` was moved from `tianshou.data` (already existed) but some internal APIs changed. The `VectorReplayBuffer` is now the recommended buffer for vectorized environments.","severity":"breaking","affected_versions":"<2.0.0"},{"fix":"Upgrade to Python 3.11+ and install `gymnasium` if you rely on custom environments.","message":"Tianshou 2.0.0+ requires Python 3.11+ and drops support for older Python versions. Also, `gymnasium` is used instead of `gym`.","severity":"gotcha","affected_versions":">=2.0.0"},{"fix":"Consider using the high-level `experiment` package for new projects.","message":"The `trainer` module's API (e.g., `offpolicy_trainer`) is being replaced in future versions by a more streamlined high-level interface. Check documentation for upcoming changes.","severity":"deprecated","affected_versions":">=2.0.0"}],"env_vars":null,"last_verified":"2026-05-01T00:00:00.000Z","next_check":"2026-07-30T00:00:00.000Z","problems":[{"fix":"Upgrade to tianshou>=2.0.0: `pip install --upgrade tianshou`.","cause":"Using an older version of tianshou (<2.0.0) that does not have the 'policy' submodule.","error":"ModuleNotFoundError: No module named 'tianshou.policy'"},{"fix":"Change import to `from tianshou.data import Collector`.","cause":"Collector was moved to tianshou.data in v2.0.0.","error":"AttributeError: module 'tianshou' has no attribute 'Collector'"},{"fix":"Ensure you provide a positive integer for buffer_num, typically equal to the number of vector environments.","cause":"VectorReplayBuffer requires buffer_num > 0, but you may have passed an empty list or zero.","error":"ValueError: The number of buffers in VectorReplayBuffer must be greater than 0"},{"fix":"Install gymnasium: `pip install gymnasium`. If using custom environments, ensure they are compatible with gymnasium.","cause":"Tianshou v2.0.0 uses gymnasium as the environment backend. If you try to use `gym.make` without installing gymnasium, or if you registered an environment under the old gym API.","error":"gym.error.Error: Cannot instantiate environment. Possibly the environment id is misspelled or the environment is not installed."}],"ecosystem":"pypi","meta_description":null,"install_score":null,"install_tag":null,"quickstart_score":null,"quickstart_tag":null}