MazeEnv¶
-
class
maze.core.env.maze_env.
MazeEnv
(*args, **kwds)¶ Base class for (gym style) environments wrapping a core environment and defining state and execution interfaces. The aim of this class is to provide reusable functionality across different gym environments. This functionality comprises for example the reset-function, the step-function or the render-function.
- Parameters
core_env – Core environment.
action_conversion_dict – A dictionary with action conversion interface implementation and policy names as keys.
observation_conversion_dict – A dictionary with observation conversion interface implementation and policy names as keys.
-
property
action_conversion
¶ Return the action conversion mapping for the current policy.
-
action_conversion_dict
¶ The action conversion mapping used by this env.
-
property
action_space
¶ Keep this env compatible with the gym interface by returning the action space of the current policy.
-
property
action_spaces_dict
¶ (overrides
StructuredEnvSpacesMixin
)Policy action spaces as dict.
-
actor_id
() → maze.core.env.structured_env.ActorID¶ (overrides
StructuredEnv
)forward call to
self.core_env
-
property
agent_counts_dict
¶ (overrides
StructuredEnv
)forward call to
self.core_env
-
clone_from
(env: maze.core.env.maze_env.MazeEnv) → None¶ (overrides
SimulatedEnvMixin
)Reset the maze env to the state of the provided env.
Note, that it also clones the CoreEnv and its member variables including environment context.
- param env
The environment to clone from.
-
close
() → None¶ (overrides
BaseEnv
)forward call to
self.core_env
-
get_actor_rewards
() → Optional[numpy.ndarray]¶ (overrides
StructuredEnv
)forward call to
self.core_env
-
get_env_time
() → int¶ (overrides
TimeEnvMixin
)Forward the call to
self.core_env
-
get_episode_id
() → str¶ (overrides
RecordableEnvMixin
)Return the ID of current episode (the ID changes on env reset).
-
get_kpi_calculator
() → Optional[maze.core.log_events.kpi_calculator.KpiCalculator]¶ (overrides
CoreEnv
)forward call to
self.core_env
-
get_maze_action
() → Any¶ (overrides
RecordableEnvMixin
)Return last MazeAction object for trajectory recording.
-
get_maze_state
() → Any¶ (overrides
RecordableEnvMixin
)Return current State object for the core env for trajectory recording.
-
get_observation_and_action_dicts
(maze_state: Optional[Any], maze_action: Optional[Any], first_step_in_episode: bool) → Tuple[Optional[Dict[Union[int, str], Any]], Optional[Dict[Union[int, str], Any]]]¶ (overrides
Wrapper
)Convert MazeState and MazeAction back into observations and actions using the space conversion interfaces.
- param maze_state
State of the environment
- param maze_action
MazeAction (the one following the state given as the first param)
- param first_step_in_episode
True if this is the first step in the episode.
- return
observation and action dictionaries (keys are substep_ids)
-
get_renderer
() → maze.core.rendering.renderer.Renderer¶ (overrides
RecordableEnvMixin
)Return the renderer exposed by the underlying core env.
-
get_step_events
() → Iterable[maze.core.events.event_record.EventRecord]¶ (overrides
CoreEnv
)forward call to
self.core_env
-
is_actor_done
() → bool¶ (overrides
StructuredEnv
)forward call to
self.core_env
-
maze_env
¶ direct access to the maze env (useful to bypass the wrapper hierarchy)
-
metadata
¶ Only there to be compatible with gym.core.Env
-
noop_action
() → Dict[str, Union[int, numpy.ndarray]]¶ (overrides
Wrapper
)Helper function for accessing the noop action for current step, compatible with the Wrapper interface.
-
property
observation_conversion
¶ Return the state to observation mapping for the current policy.
-
observation_conversion_dict
¶ The observation conversion mapping used by this env.
-
property
observation_space
¶ Keep this env compatible with the gym interface by returning the observation space of the current policy.
-
property
observation_spaces_dict
¶ (overrides
StructuredEnvSpacesMixin
)Policy observation spaces as dict.
-
reset
() → Dict[str, numpy.ndarray]¶ (overrides
BaseEnv
)Resets the environment and returns the initial observation.
- return
the initial observation after resetting.
-
reward_range
¶ A tuple (reward min value, reward max value) to be compatible with gym.core.Env
-
seed
(seed: Any) → None¶ (overrides
BaseEnv
)forward call to
self.core_env
-
set_core_env
(core_env: maze.core.env.core_env.CoreEnv) → None¶ Helper method for setting the core env to a new, different core env instance while maintaining the same core env context object (to not break event reporting, callbacks etc.).
Helpful e.g. during deployment, when simulation in core env is not needed, and we are just mirroring the production environment instead.
The old core env instanced is not referenced anymore and should be discarded.
-
spec
¶ Only there to be compatible with gym.core.Env
-
step
(action: Dict[str, Union[int, numpy.ndarray]]) → Tuple[Dict[str, numpy.ndarray], float, bool, Dict[Any, Any]]¶ (overrides
BaseEnv
)Take environment step (see
CoreEnv.step
for details).- param action
the action the agent wants to take.
- return
observation, reward, done, info