{"id":7187,"library":"dm-env","title":"dm-env","description":"dm-env is a Python interface for Reinforcement Learning (RL) environments, providing a foundational API for interacting with environments. It is actively maintained by DeepMind and is currently at version 1.6. The library focuses on core components such as `Environment`, `TimeStep`, and `specs` for defining actions, observations, rewards, and discounts.","status":"active","version":"1.6","language":"en","source_language":"en","source_url":"https://github.com/google-deepmind/dm_env","tags":["reinforcement-learning","environment","deepmind","api","machine-learning"],"install":[{"cmd":"pip install dm-env","lang":"bash","label":"Install stable release"}],"dependencies":[{"reason":"Core data structures (observations, actions, specs) are based on NumPy arrays.","package":"numpy","optional":false}],"imports":[{"symbol":"Environment","correct":"from dm_env import Environment"},{"symbol":"TimeStep","correct":"from dm_env import TimeStep"},{"symbol":"StepType","correct":"from dm_env import StepType"},{"symbol":"specs","correct":"from dm_env import specs"},{"symbol":"ArraySpec","correct":"from dm_env.specs import ArraySpec"},{"symbol":"BoundedArraySpec","correct":"from dm_env.specs import BoundedArraySpec"}],"quickstart":{"code":"import numpy as np\nfrom dm_env import Environment, TimeStep, specs, StepType\n\nclass SimpleCountingEnv(Environment):\n    def __init__(self, max_count=5):\n        self._max_count = max_count\n        self._current_count = 0\n        self._reset_next_step = True\n\n    def discount_spec(self):\n        return specs.BoundedArray(shape=(), dtype=float, minimum=0.0, maximum=1.0, name='discount')\n\n    def observation_spec(self):\n        return specs.BoundedArray(shape=(), dtype=int, minimum=0, maximum=self._max_count, name='count')\n\n    def action_spec(self):\n        return specs.BoundedArray(shape=(), dtype=int, minimum=0, maximum=1, name='action') # 0: no-op, 1: increment\n\n    def reward_spec(self):\n        return specs.Array(shape=(), dtype=float, name='reward')\n\n    def _reset(self):\n        self._current_count = 0\n        self._reset_next_step = False\n        return TimeStep(step_type=StepType.FIRST,\n                        reward=None,\n                        discount=None,\n                        observation=np.asarray(self._current_count, dtype=int))\n\n    def _step(self, action):\n        if self._reset_next_step:\n            return self._reset()\n\n        if action == 1:\n            self._current_count += 1\n\n        if self._current_count >= self._max_count:\n            self._reset_next_step = True\n            return TimeStep(step_type=StepType.LAST,\n                            reward=np.asarray(1.0, dtype=float),\n                            discount=np.asarray(0.0, dtype=float),\n                            observation=np.asarray(self._current_count, dtype=int))\n        else:\n            return TimeStep(step_type=StepType.MID,\n                            reward=np.asarray(0.0, dtype=float),\n                            discount=np.asarray(1.0, dtype=float),\n                            observation=np.asarray(self._current_count, dtype=int))\n\n    def reset(self):\n        return self._reset()\n\n    def step(self, action):\n        return self._step(action)\n\n# --- Example Usage ---\nenv = SimpleCountingEnv()\n\ntimestep = env.reset()\nprint(f\"Initial: {timestep.observation}\")\n\nwhile not timestep.last():\n    action = 1 # Always try to increment\n    timestep = env.step(action)\n    print(f\"Step {env._current_count}: Obs={timestep.observation}, Reward={timestep.reward}, Type={timestep.step_type.name}\")\n\n# Demonstrating reset after LAST timestep\ntimestep = env.step(0) # Action is ignored here\nprint(f\"After last, calling step (action ignored): {timestep.observation}\")\n","lang":"python","description":"This quickstart defines a simple counting environment using `dm-env`'s `Environment` abstract base class. It showcases how to define action, observation, reward, and discount specifications using `dm_env.specs`, implement `_reset` and `_step` methods, and interact with the environment through `reset()` and `step()` calls. The example also highlights the `TimeStep` object and its `step_type` attribute for managing episode progression."},"warnings":[{"fix":"Ensure your Python environment is 3.6 or newer. Upgrade Python if necessary.","message":"From `dm-env` version 1.4 onwards, the library officially supports Python 3.6 and newer. Older Python versions are not supported for recent releases.","severity":"breaking","affected_versions":">=1.4"},{"fix":"Always call `env.reset()` explicitly to start a new sequence if you intend to begin an episode. Be aware that the first action after `StepType.LAST` will be disregarded.","message":"Calling `env.step(action)` on a newly created environment instance or immediately after a `TimeStep` with `StepType.LAST` will implicitly trigger a reset, and the provided `action` argument will be ignored. The environment will return a `TimeStep` with `StepType.FIRST`.","severity":"gotcha","affected_versions":"all"},{"fix":"When processing `TimeStep` objects, check `timestep.step_type == StepType.LAST` to identify the end of an episode, rather than relying solely on `timestep.discount == 0.0`. A discount can be 0.0 mid-sequence.","message":"A `discount` value of 0.0 in a `TimeStep` does not necessarily signify the end of a sequence (episode). The `StepType` enum (specifically `StepType.LAST`) should be used to determine if a sequence has terminated.","severity":"gotcha","affected_versions":"all"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Manually implement sampling logic based on the `dtype`, `shape`, `minimum`, and `maximum` properties of the spec. For example, use `np.random.uniform(spec.minimum, spec.maximum, size=spec.shape)` for `BoundedArraySpec` or `np.zeros(spec.shape, dtype=spec.dtype)` as a placeholder.","cause":"Unlike some other RL frameworks (e.g., OpenAI Gym spaces), `dm-env.specs.ArraySpec` and `BoundedArraySpec` do not inherently provide a `.sample()` method for generating random values within their defined bounds. Users often expect this functionality to easily sample actions or observations.","error":"AttributeError: 'ArraySpec' object has no attribute 'sample'"},{"fix":"Monitor memory usage of your agent and environment. Reduce replay buffer size, optimize observation/action data types (e.g., use `np.uint8` instead of `np.float32` if appropriate), or consider using techniques like experience replay compression.","cause":"While not directly a `dm-env` issue, larger RL setups using `dm-env` environments, especially with extensive replay buffers or complex observation spaces, can lead to high memory consumption and Out-Of-Memory (OOM) errors. This is particularly common in deep RL agent implementations.","error":"MemoryError / Detected OOM-kill event(s)"},{"fix":"Use a compatibility wrapper library like `shimmy` (e.g., `shimmy.DmControlCompatibilityV0` for `dm-control` environments which use `dm-env` internally) to convert `dm-env` environments to the Gymnasium interface if you need to use Gymnasium-compatible agents or tools. Alternatively, implement a custom adapter or wrapper.","cause":"The `dm-env` interface is distinct from the popular Gymnasium (formerly OpenAI Gym) API. Directly using environments implemented with `dm-env` alongside agents designed for Gymnasium, or vice-versa, requires conversion or wrappers.","error":"Incompatibility with Gymnasium/Gym-style environments or agents"}]}