MazeEnv¶

class maze.core.env.maze_env.MazeEnv(*args, **kwds)¶

Base class for (gym style) environments wrapping a core environment and defining state and execution interfaces. The aim of this class is to provide reusable functionality across different gym environments. This functionality comprises for example the reset-function, the step-function or the render-function.

Parameters

core_env – Core environment.
action_conversion_dict – A dictionary with action conversion interface implementation and policy names as keys.
observation_conversion_dict – A dictionary with observation conversion interface implementation and policy names as keys.

property action_conversion¶: Return the action conversion mapping for the current policy.

action_conversion_dict¶: The action conversion mapping used by this env.

property action_space¶: Keep this env compatible with the gym interface by returning the action space of the current policy.

property action_spaces_dict¶

(overrides StructuredEnvSpacesMixin)

Policy action spaces as dict.

actor_id() → maze.core.env.structured_env.ActorID ¶

(overrides StructuredEnv)

forward call to self.core_env

property agent_counts_dict¶

(overrides StructuredEnv)

forward call to self.core_env

clone_from(env: maze.core.env.maze_env.MazeEnv) → None ¶

(overrides SimulatedEnvMixin)

Reset the maze env to the state of the provided env.

Note, that it also clones the CoreEnv and its member variables including environment context.

param env

The environment to clone from.

close() → None ¶

(overrides BaseEnv)

forward call to self.core_env

core_env¶: wrapped CoreEnv

get_actor_rewards() → Optional[numpy.ndarray]¶

(overrides StructuredEnv)

forward call to self.core_env

get_env_time() → int ¶

(overrides TimeEnvMixin)

Forward the call to self.core_env

get_episode_id() → str ¶

(overrides RecordableEnvMixin)

Return the ID of current episode (the ID changes on env reset).

get_kpi_calculator() → Optional[maze.core.log_events.kpi_calculator.KpiCalculator]¶

(overrides CoreEnv)

forward call to self.core_env

get_maze_action() → Any¶

(overrides RecordableEnvMixin)

Return last MazeAction object for trajectory recording.

get_maze_state() → Any¶

(overrides RecordableEnvMixin)

Return current State object for the core env for trajectory recording.

get_observation_and_action_dicts(maze_state: Optional[Any], maze_action: Optional[Any], first_step_in_episode: bool) → Tuple[Optional[Dict[Union[int, str], Any]], Optional[Dict[Union[int, str], Any]]]¶

(overrides Wrapper)

Convert MazeState and MazeAction back into observations and actions using the space conversion interfaces.

param maze_state

State of the environment

param maze_action

MazeAction (the one following the state given as the first param)

param first_step_in_episode

True if this is the first step in the episode.

return

observation and action dictionaries (keys are substep_ids)

get_renderer() → maze.core.rendering.renderer.Renderer ¶

(overrides RecordableEnvMixin)

Return the renderer exposed by the underlying core env.

get_step_events() → Iterable[maze.core.events.event_record.EventRecord]¶

(overrides CoreEnv)

forward call to self.core_env

is_actor_done() → bool ¶

(overrides StructuredEnv)

forward call to self.core_env

maze_env¶: direct access to the maze env (useful to bypass the wrapper hierarchy)

metadata¶: Only there to be compatible with gym.core.Env

noop_action() → Dict[str, Union[int, numpy.ndarray]]¶

(overrides Wrapper)

Helper function for accessing the noop action for current step, compatible with the Wrapper interface.

property observation_conversion¶: Return the state to observation mapping for the current policy.

observation_conversion_dict¶: The observation conversion mapping used by this env.

property observation_space¶: Keep this env compatible with the gym interface by returning the observation space of the current policy.

property observation_spaces_dict¶

(overrides StructuredEnvSpacesMixin)

Policy observation spaces as dict.

reset() → Dict[str, numpy.ndarray]¶

(overrides BaseEnv)

Resets the environment and returns the initial observation.

return

the initial observation after resetting.

reward_range¶: A tuple (reward min value, reward max value) to be compatible with gym.core.Env

seed(seed: Any) → None ¶

(overrides BaseEnv)

forward call to self.core_env

set_core_env(core_env: maze.core.env.core_env.CoreEnv) → None ¶

Helper method for setting the core env to a new, different core env instance while maintaining the same core env context object (to not break event reporting, callbacks etc.).

Helpful e.g. during deployment, when simulation in core env is not needed, and we are just mirroring the production environment instead.

The old core env instanced is not referenced anymore and should be discarded.

spec¶: Only there to be compatible with gym.core.Env

step(action: Dict[str, Union[int, numpy.ndarray]]) → Tuple[Dict[str, numpy.ndarray], float, bool, Dict[Any, Any]]¶

(overrides BaseEnv)

Take environment step (see CoreEnv.step for details).

param action

the action the agent wants to take.

return

observation, reward, done, info