Tianshou

raw JSON →
2.0.1 verified Fri May 01 auth: no python

A library for deep reinforcement learning, providing a modular and flexible framework for implementing and benchmarking RL algorithms. Current version is 2.0.1, released in 2025. The library follows a steady release cadence, with major version 2.0 overhauling the procedural API and separating algorithms from policies.

pip install tianshou
error ModuleNotFoundError: No module named 'tianshou.policy'
cause Using an older version of tianshou (<2.0.0) that does not have the 'policy' submodule.
fix
Upgrade to tianshou>=2.0.0: pip install --upgrade tianshou.
error AttributeError: module 'tianshou' has no attribute 'Collector'
cause Collector was moved to tianshou.data in v2.0.0.
fix
Change import to from tianshou.data import Collector.
error ValueError: The number of buffers in VectorReplayBuffer must be greater than 0
cause VectorReplayBuffer requires buffer_num > 0, but you may have passed an empty list or zero.
fix
Ensure you provide a positive integer for buffer_num, typically equal to the number of vector environments.
error gym.error.Error: Cannot instantiate environment. Possibly the environment id is misspelled or the environment is not installed.
cause Tianshou v2.0.0 uses gymnasium as the environment backend. If you try to use `gym.make` without installing gymnasium, or if you registered an environment under the old gym API.
fix
Install gymnasium: pip install gymnasium. If using custom environments, ensure they are compatible with gymnasium.
breaking v2.0.0 completely overhauled the API. `Policy` and `Algorithm` abstractions are separated; many imports changed. Old code using `from tianshou import Policy` will break.
fix Update imports to use submodules (e.g., `from tianshou.policy import ...`) and adopt the new `Algorithm`/`Policy` split.
breaking In v2.0.0, `ReplayBuffer` was moved from `tianshou.data` (already existed) but some internal APIs changed. The `VectorReplayBuffer` is now the recommended buffer for vectorized environments.
fix Use `from tianshou.data import VectorReplayBuffer` and ensure buffer API calls are updated.
gotcha Tianshou 2.0.0+ requires Python 3.11+ and drops support for older Python versions. Also, `gymnasium` is used instead of `gym`.
fix Upgrade to Python 3.11+ and install `gymnasium` if you rely on custom environments.
deprecated The `trainer` module's API (e.g., `offpolicy_trainer`) is being replaced in future versions by a more streamlined high-level interface. Check documentation for upcoming changes.
fix Consider using the high-level `experiment` package for new projects.

Train a DQN agent on CartPole-v1 using off-policy training.

import gymnasium as gym
from tianshou.data import Collector, VectorReplayBuffer
from tianshou.env import DummyVectorEnv
from tianshou.policy import DQNPolicy
from tianshou.trainer import offpolicy_trainer
import torch

env = gym.make('CartPole-v1')
train_envs = DummyVectorEnv([lambda: gym.make('CartPole-v1') for _ in range(10)])
test_envs = DummyVectorEnv([lambda: gym.make('CartPole-v1') for _ in range(10)])

state_shape = env.observation_space.shape or env.observation_space.n
action_shape = env.action_space.shape or env.action_space.n

policy = DQNPolicy(
    state_shape=state_shape,
    action_shape=action_shape,
    model=torch.nn.Linear(state_shape, action_shape),
    optim=torch.optim.Adam(policy_network.parameters(), lr=1e-3),
).to('cpu')
buffer = VectorReplayBuffer(total_size=20000, buffer_num=len(train_envs))
collector = Collector(policy, train_envs, buffer)
test_collector = Collector(policy, test_envs)

def stop_fn(reward):
    return reward >= 195

result = offpolicy_trainer(
    policy=policy,
    train_collector=collector,
    test_collector=test_collector,
    max_epoch=10,
    step_per_epoch=1000,
    step_per_collect=10,
    episode_per_test=5,
    batch_size=64,
    stop_fn=stop_fn,
    update_per_step=1,
)
print(f'Finished training in {result.timing.total_time_seconds}s')