Tianshou
raw JSON → 2.0.1 verified Fri May 01 auth: no python
A library for deep reinforcement learning, providing a modular and flexible framework for implementing and benchmarking RL algorithms. Current version is 2.0.1, released in 2025. The library follows a steady release cadence, with major version 2.0 overhauling the procedural API and separating algorithms from policies.
pip install tianshou Common errors
error ModuleNotFoundError: No module named 'tianshou.policy' ↓
cause Using an older version of tianshou (<2.0.0) that does not have the 'policy' submodule.
fix
Upgrade to tianshou>=2.0.0:
pip install --upgrade tianshou. error AttributeError: module 'tianshou' has no attribute 'Collector' ↓
cause Collector was moved to tianshou.data in v2.0.0.
fix
Change import to
from tianshou.data import Collector. error ValueError: The number of buffers in VectorReplayBuffer must be greater than 0 ↓
cause VectorReplayBuffer requires buffer_num > 0, but you may have passed an empty list or zero.
fix
Ensure you provide a positive integer for buffer_num, typically equal to the number of vector environments.
error gym.error.Error: Cannot instantiate environment. Possibly the environment id is misspelled or the environment is not installed. ↓
cause Tianshou v2.0.0 uses gymnasium as the environment backend. If you try to use `gym.make` without installing gymnasium, or if you registered an environment under the old gym API.
fix
Install gymnasium:
pip install gymnasium. If using custom environments, ensure they are compatible with gymnasium. Warnings
breaking v2.0.0 completely overhauled the API. `Policy` and `Algorithm` abstractions are separated; many imports changed. Old code using `from tianshou import Policy` will break. ↓
fix Update imports to use submodules (e.g., `from tianshou.policy import ...`) and adopt the new `Algorithm`/`Policy` split.
breaking In v2.0.0, `ReplayBuffer` was moved from `tianshou.data` (already existed) but some internal APIs changed. The `VectorReplayBuffer` is now the recommended buffer for vectorized environments. ↓
fix Use `from tianshou.data import VectorReplayBuffer` and ensure buffer API calls are updated.
gotcha Tianshou 2.0.0+ requires Python 3.11+ and drops support for older Python versions. Also, `gymnasium` is used instead of `gym`. ↓
fix Upgrade to Python 3.11+ and install `gymnasium` if you rely on custom environments.
deprecated The `trainer` module's API (e.g., `offpolicy_trainer`) is being replaced in future versions by a more streamlined high-level interface. Check documentation for upcoming changes. ↓
fix Consider using the high-level `experiment` package for new projects.
Imports
- Policy wrong
from tianshou import Policycorrectfrom tianshou.policy import Policy - Collector wrong
from tianshou import Collectorcorrectfrom tianshou.data import Collector - ReplayBuffer wrong
from tianshou.replay import ReplayBuffercorrectfrom tianshou.data import ReplayBuffer
Quickstart
import gymnasium as gym
from tianshou.data import Collector, VectorReplayBuffer
from tianshou.env import DummyVectorEnv
from tianshou.policy import DQNPolicy
from tianshou.trainer import offpolicy_trainer
import torch
env = gym.make('CartPole-v1')
train_envs = DummyVectorEnv([lambda: gym.make('CartPole-v1') for _ in range(10)])
test_envs = DummyVectorEnv([lambda: gym.make('CartPole-v1') for _ in range(10)])
state_shape = env.observation_space.shape or env.observation_space.n
action_shape = env.action_space.shape or env.action_space.n
policy = DQNPolicy(
state_shape=state_shape,
action_shape=action_shape,
model=torch.nn.Linear(state_shape, action_shape),
optim=torch.optim.Adam(policy_network.parameters(), lr=1e-3),
).to('cpu')
buffer = VectorReplayBuffer(total_size=20000, buffer_num=len(train_envs))
collector = Collector(policy, train_envs, buffer)
test_collector = Collector(policy, test_envs)
def stop_fn(reward):
return reward >= 195
result = offpolicy_trainer(
policy=policy,
train_collector=collector,
test_collector=test_collector,
max_epoch=10,
step_per_epoch=1000,
step_per_collect=10,
episode_per_test=5,
batch_size=64,
stop_fn=stop_fn,
update_per_step=1,
)
print(f'Finished training in {result.timing.total_time_seconds}s')