Wrapper

class maze.core.wrappers.wrapper.Wrapper(*args, **kwds)

A transparent environment Wrapper that works with any manifestation of BaseEnv. It is intended as drop-in replacement for gym.core.Wrapper.

Gym Wrappers elegantly expose methods and attributes of all nested envs. However wrapping destroys the class hierarchy, querying the base classes is not straight-forward. This environment wrapper fixes the behaviour of isinstance() for arbitrarily nested wrappers.

Suppose we want to check the base class:

class MyGymWrapper(Wrapper[gym.Env]):

# construct an env and wrap it env = MyEnv() env = MyGymWrapper(env)

# this assertion fails assert isinstance(env, MyEnv) == True

TypingWrapper makes isinstance() work as intuitively expected:

# this time use MyWrapper, which is derived from this Wrapper class env = MyEnv() env = MyWrapper(env)

# now the assertions hold assert isinstance(env, MyEnv) == True assert isinstance(env, MyWrapper) == True

Note:

gym.core.Wrapper assumes the existence of certain attributes (action_space, observation_space, reward_range, metadata) and duplicates these attributes. This behaviour is unnecessary, because __getattr__ makes these members of the inner environment transparently available anyway.

clone_from(env: EnvType)None

(overrides SimulatedEnvMixin)

implementation of SimulatedEnvMixin.

Note: implementing this method is required for stateful environment wrappers.

get_observation_and_action_dicts(maze_state: Optional[Any], maze_action: Optional[Any], first_step_in_episode: bool) → Tuple[Optional[Dict[Union[int, str], Any]], Optional[Dict[Union[int, str], Any]]]

Convert MazeState and MazeAction back into raw action and observation.

This method is mostly used when working with trajectory data, e.g. for imitation learning. As part of trajectory data, MazeState and MazeActions are recorded. For imitation learning, they then need to be converted to raw observations and actions in the desired format (i.e. using all the required wrappers etc.)

The conversion is done by first transforming the MazeState and MazeAction using the space interfaces in MazeEnv, and then running them through the entire wrapper stack (“back up”).

Both the MazeState and the MazeAction on top of it are converted as part of this single method, as some wrappers (mostly multi-step ones) need them both together (to be able to split them into observations and actions taken in different sub-steps). If you are not using multi-step wrappers, you don’t need to convert both MazeState and MazeAction, you can pass in just one of them. Not all wrappers have to support this though.

See below for an example implementation.

Note: The conversion of MazeState to observation is in the “natural” direction, how it takes place when stepping the env. This is not true for the MazeAction to action conversion – when stepping the env, actions are converted to MazeActions, whereas here the MazeAction needs to be converted back into the “raw” action (i.e. in reverse direction).

(!) Attention: In case that there are some stateful wrappers in the wrapper stack (e.g. a wrapper stacking observations from previous steps), you should ensure that (1) the first_step_in_episode flag is passed to this function correctly and (2) that all states and MazeActions are converted in order – as they happened during the recorded episode.

Parameters
  • maze_state – MazeState to convert. If none, only MazeAction will be converted (not all wrappers support this).

  • maze_action – MazeAction (the one following the state given as the first param). If none, only MazeState will be converted (not all wrappers support this, some need both).

  • first_step_in_episode – True if this is the first step in the episode. Serves to notify stateful wrappers (e.g. observation stacking) that they should reset their state.

Returns

observation and action dictionaries (keys are IDs of sub-steps)

step_with_callbacks(*args, **kwargs) → Any

A wrapper for the env.step function. Checks whether callbacks for this step have already been process (i.e., detects whether this is the outermost wrapper). Triggers the post-step callbacks if required.

classmethod wrap(env: T, **kwargs) → Union[T, WrapperType]

Creation method providing appropriate type hints. Preferred method to construct the wrapper compared to calling the class constructor directly. :param env: The environment to be wrapped :param kwargs: Arguments to be passed on to wrapper’s constructor. :return A newly created wrapper instance. Since we want to allow sub-classes to use .wrap() without having to reimplement them and still facilitate proper typing hints, we use a generic to represent the type of cls. See https://stackoverflow.com/questions/39205527/can-you-annotate-return-type-when-value-is-instance-of-cls/39205612#39205612 on why/how to use this to indicate that an instance of cls is returned.