{"id":5524,"library":"torchrl","title":"TorchRL","description":"TorchRL is an open-source, PyTorch-native library for Reinforcement Learning (RL). It provides modular, primitive-first abstractions for building efficient and flexible RL solutions, focusing on research and rapid prototyping. The library offers components for environments, data collection, replay buffers, policy and value networks, and loss functions, all designed to integrate seamlessly with the PyTorch ecosystem. It is currently at version 0.11.1 and follows a regular release cadence, often synced with PyTorch releases.","status":"active","version":"0.11.1","language":"en","source_language":"en","source_url":"https://github.com/pytorch/rl","tags":["reinforcement-learning","pytorch","machine-learning","deep-learning","rlhf"],"install":[{"cmd":"pip install torchrl","lang":"bash","label":"Stable release"},{"cmd":"pip install tensordict-nightly torchrl-nightly","lang":"bash","label":"Nightly release"}],"dependencies":[{"reason":"TorchRL is built on PyTorch and requires a compatible version.","package":"pytorch","optional":false},{"reason":"TorchRL's core data structure is TensorDict, requiring the tensordict library.","package":"tensordict","optional":false},{"reason":"Required for using environments from the Gymnasium library.","package":"gymnasium","optional":true},{"reason":"Required for the experimental command-line training interface (installed with `torchrl[utils]`).","package":"hydra-core","optional":true},{"reason":"Required for the experimental command-line training interface (installed with `torchrl[utils]`).","package":"omegaconf","optional":true}],"imports":[{"note":"TensorDict is the core data structure for TorchRL, not directly in `torchrl` namespace.","symbol":"TensorDict","correct":"from tensordict import TensorDict"},{"symbol":"GymEnv","correct":"from torchrl.envs import GymEnv"},{"symbol":"MLP","correct":"from torchrl.modules import MLP"},{"symbol":"QValueActor","correct":"from torchrl.modules import QValueActor"},{"symbol":"PPOLoss","correct":"from torchrl.objectives import PPOLoss"},{"symbol":"SyncDataCollector","correct":"from torchrl.collectors import SyncDataCollector"}],"quickstart":{"code":"import torch\nfrom torchrl.envs import GymEnv\nfrom torchrl.modules import MLP, QValueActor\nfrom tensordict import TensorDict\n\n# 1. Define the environment\nenv = GymEnv(\"CartPole-v1\")\n\n# 2. Create the policy (Q-value actor with an MLP backbone)\nactor = QValueActor(\n    MLP(\n        in_features=env.observation_spec[\"observation\"].shape[-1],\n        out_features=env.action_spec.shape[-1] if env.action_spec.shape else 2,\n        num_cells=[64, 64],\n    ),\n    in_keys=[\"observation\"],\n    spec=env.action_spec,\n)\n\n# 3. Collect a trajectory\nrollout = env.rollout(max_steps=200, policy=actor)\n\n# Print collected info\nprint(f\"Collected {rollout.shape[0]} steps, total reward: {rollout['next', 'reward'].sum().item():.0f}\")\nprint(f\"Rollout keys: {rollout.keys()}\")\nprint(f\"Example observation shape: {rollout['observation'].shape}\")\n\nenv.close()\n","lang":"python","description":"This quickstart demonstrates how to create a simple Gym environment, define a Q-value policy using an MLP, and collect a trajectory with a specified maximum number of steps. The collected data is stored in a TensorDict."},"warnings":[{"fix":"Review the `torchrl.collectors` package documentation and examples for the restructured API and adjust collector instantiation and usage accordingly. The 5000+ line `collectors.py` was split into focused modules.","message":"In TorchRL v0.11.0, the collector codebase underwent a major refactoring. Existing implementations of collectors, especially `SyncDataCollector`, `MultiSyncDataCollector`, and `aSyncDataCollector`, may require updates to align with the new modular structure.","severity":"breaking","affected_versions":">=0.11.0"},{"fix":"Update code to use the new, recommended paths and classes for previously deprecated features. Refer to the v0.11 release notes for a complete list.","message":"TorchRL v0.11.0 removed several deprecated features, replacing previous warnings with errors. This includes `KLRewardTransform` (use `torchrl.envs.llm.KLRewardTransform`), `LogReward` and `Recorder` (use `LogScalar` and `LogValidationReward`), and `unbatched_*_spec` properties from `VmasWrapper`/`VmasEnv` (use `full_*_spec_unbatched`).","severity":"breaking","affected_versions":">=0.11.0"},{"fix":"Ensure you install or upgrade to the latest stable PyTorch release *before* installing TorchRL. If using an older PyTorch is necessary, you might need to install `functorch` compatible with your PyTorch version and then install `torchrl` from source.","message":"Using TorchRL with PyTorch versions older than 2.0 (e.g., PyTorch 1.12 with Python 3.7) can lead to `ImportError: undefined symbol` errors when installing the stable `torchrl` package.","severity":"gotcha","affected_versions":"<2.0 (PyTorch) with stable `torchrl`"},{"fix":"Upgrade TorchRL to version 0.7.2 or newer. When using `ParallelEnv` or `BatchedEnv` with different devices for sub-environments and the batched environment, ensure careful device management to prevent data corruption or unexpected behavior. Data will be automatically cast to the appropriate device during collection.","message":"In TorchRL versions prior to 0.7.2, a critical issue existed where incorrect device settings in `ParallelEnv` could prevent tensors in buffers from being properly cloned, causing rollouts to return the same tensor instances across steps and potentially leading to incorrect behavior.","severity":"gotcha","affected_versions":"<0.7.2"},{"fix":"When initializing `PPOLoss`, explicitly pass `critic_coeff` instead of relying on `critic_network` for its default coefficient. For example, pass `critic_coeff=1.0` if a critic network is provided.","message":"The `PPOLoss` class in TorchRL v0.11.0 issues a warning regarding the use of `critic_network` directly and suggests using the `critic_coeff` argument instead for better control over the critic's contribution to the loss.","severity":"deprecated","affected_versions":">=0.11.0"}],"env_vars":null,"last_verified":"2026-04-13T00:00:00.000Z","next_check":"2026-07-12T00:00:00.000Z"}