Gym (OpenAI Gym)
Gym (formerly OpenAI Gym) is a Python library that provided a universal API for developing and comparing reinforcement learning (RL) algorithms across a diverse collection of environments. While it was historically the standard for RL environments, the `gym` library is no longer actively maintained. All future development and support have transitioned to its successor, `gymnasium`, a drop-in replacement. The last major release of `gym` was version 0.26.2, released in October 2022, which introduced significant breaking API changes.
Warnings
- breaking The `gym` library is no longer maintained; all future development and support have moved to `gymnasium`. Users are strongly encouraged to migrate to `gymnasium` for continued updates, bug fixes, and compatibility with modern Python and NumPy versions.
- breaking The `env.step()` method now returns a 5-tuple: `(observation, reward, terminated, truncated, info)`. The old `done` flag is split into `terminated` (agent's action led to termination) and `truncated` (e.g., time limit reached).
- breaking The `env.reset()` method now returns a 2-tuple: `(observation, info)`. The `return_info` parameter has been removed.
- breaking The `env.seed()` method has been removed. Environment seeding is now handled by passing a `seed` argument to `env.reset()`.
- breaking The `render_mode` should be specified during `gym.make()` (e.g., `gym.make('Env-v1', render_mode='human')`) and is no longer passed to the `env.render()` method.
- gotcha Many environments require additional dependencies beyond the base `pip install gym`. Attempting to `gym.make()` such an environment without its extras will result in `ModuleNotFoundError`.
Install
-
pip install gym -
pip install 'gym[atari]' # Example for Atari environments -
pip install 'gym[all]' # Install all supported environments
Imports
- gym
import gym
- make
env = gym.make('CartPole-v1')
Quickstart
import gym
env = gym.make("CartPole-v1", render_mode="human")
# Reset returns (observation, info) in 0.26.x+
observation, info = env.reset(seed=42)
for _ in range(1000):
action = env.action_space.sample() # Agent selects an action
# Step returns (observation, reward, terminated, truncated, info) in 0.26.x+
observation, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
print(f"Episode finished after {_+1} timesteps.")
observation, info = env.reset(seed=42) # Reset for a new episode
env.close()