MazeEnv¶
-
class
maze.core.env.maze_env.
MazeEnv
(*args, **kwds)¶ Base class for (gym style) environments wrapping a core environment and defining state and execution interfaces. The aim of this class is to provide reusable functionality across different gym environments. This functionality comprises for example the reset-function, the step-function or the render-function.
- Parameters
core_env – Core environment.
action_conversion_dict – A dictionary with action conversion interface implementation and policy names as keys.
observation_conversion_dict – A dictionary with observation conversion interface implementation and policy names as keys.
-
property
action_conversion
¶ Return the action conversion mapping for the current policy.
-
action_conversion_dict
¶ The action conversion mapping used by this env.
-
property
action_space
¶ Keep this env compatible with the gym interface by returning the action space of the current policy.
-
property
action_spaces_dict
¶ (overrides
StructuredEnvSpacesMixin
)Policy action spaces as dict.
-
actor_id
() → maze.core.env.structured_env.ActorID¶ (overrides
StructuredEnv
)forward call to
self.core_env
-
property
agent_counts_dict
¶ (overrides
StructuredEnv
)forward call to
self.core_env
-
clone_from
(env: maze.core.env.maze_env.MazeEnv) → None¶ (overrides
SimulatedEnvMixin
)Reset the maze env to the state of the provided env.
Note, that it also clones the CoreEnv and its member variables including environment context.
- param env
The environment to clone from.
-
close
() → None¶ (overrides
BaseEnv
)forward call to
self.core_env
-
get_actor_rewards
() → Optional[numpy.ndarray]¶ (overrides
StructuredEnv
)forward call to
self.core_env
-
get_env_time
() → int¶ (overrides
TimeEnvMixin
)Forward the call to
self.core_env
-
get_episode_id
() → str¶ (overrides
RecordableEnvMixin
)Return the ID of current episode (the ID changes on env reset).
-
get_kpi_calculator
() → Optional[maze.core.log_events.kpi_calculator.KpiCalculator]¶ (overrides
CoreEnv
)forward call to
self.core_env
-
get_maze_action
() → Any¶ (overrides
RecordableEnvMixin
)Return last MazeAction object for trajectory recording.
-
get_maze_state
() → Any¶ (overrides
RecordableEnvMixin
)Return current State object for the core env for trajectory recording.
-
get_observation_and_action_dicts
(maze_state: Optional[Any], maze_action: Optional[Any], first_step_in_episode: bool) → Tuple[Optional[Dict[Union[int, str], Any]], Optional[Dict[Union[int, str], Any]]]¶ (overrides
Wrapper
)Convert MazeState and MazeAction back into observations and actions using the space conversion interfaces.
- param maze_state
State of the environment
- param maze_action
MazeAction (the one following the state given as the first param)
- param first_step_in_episode
True if this is the first step in the episode.
- return
observation and action dictionaries (keys are substep_ids)
-
get_renderer
() → maze.core.rendering.renderer.Renderer¶ (overrides
RecordableEnvMixin
)Return the renderer exposed by the underlying core env.
-
get_step_events
() → Iterable[maze.core.events.event_record.EventRecord]¶ (overrides
CoreEnv
)forward call to
self.core_env
-
is_actor_done
() → bool¶ (overrides
StructuredEnv
)forward call to
self.core_env
-
maze_env
¶ direct access to the maze env (useful to bypass the wrapper hierarchy)
-
metadata
¶ Only there to be compatible with gym.core.Env
-
property
observation_conversion
¶ Return the state to observation mapping for the current policy.
-
observation_conversion_dict
¶ The observation conversion mapping used by this env.
-
property
observation_space
¶ Keep this env compatible with the gym interface by returning the observation space of the current policy.
-
property
observation_spaces_dict
¶ (overrides
StructuredEnvSpacesMixin
)Policy observation spaces as dict.
-
reset
() → Dict[str, numpy.ndarray]¶ (overrides
BaseEnv
)Resets the environment and returns the initial observation.
- return
the initial observation after resetting.
-
reward_range
¶ A tuple (reward min value, reward max value) to be compatible with gym.core.Env
-
seed
(seed: Any) → None¶ (overrides
BaseEnv
)forward call to
self.core_env
-
spec
¶ Only there to be compatible with gym.core.Env
-
step
(action: Dict[str, Union[int, numpy.ndarray]]) → Tuple[Dict[str, numpy.ndarray], float, bool, Dict[Any, Any]]¶ (overrides
BaseEnv
)Take environment step (see
CoreEnv.step
for details).- param action
the action the agent wants to take.
- return
observation, reward, done, info