class maze.core.env.core_env.CoreEnv

Interface definition for core environments forming the basis for actual RL trainable environments.

abstract actor_id()maze.core.env.structured_env.ActorID

(overrides StructuredEnv)

Returns the currently executed actor. The id is unique only with

respect to the policies (every policy has its own agent 0).

Note that identities of done actors can not be reused in the same rollout.


The current actor, as a named tuple holding step_key and agent_id.

abstract property agent_counts_dict

(overrides StructuredEnv)

Returns the maximum count of agents per sub-step that the environment features.

For example:
  • For a vehicle-routing environment where 5 driver agents will get to act during sub-step 0, this method should return {0: 5}

  • For a two-step cutting environment where a piece is selected during sub-step 0 and then cut during sub-step 1 (with just one selection and cut happening in each step), this method should return {0: 1, 1: 1}

clone_from(env: maze.core.env.core_env.CoreEnv)None

implementation of SimulatedEnvMixin.

Cloning ‘self.context’ and ‘self.reward_aggregator’ is not required here anymore as it is already implemented in the MazeEnv as default behaviour.

abstract close()None

(overrides StructuredEnv)

Performs any necessary cleanup.


(overrides TimeEnvMixin)

Return ID of the current step as env time.

get_kpi_calculator() → Optional[maze.core.log_events.kpi_calculator.KpiCalculator]

(overrides EventEnvMixin)

By default, Core Envs do not have to support KPIs.

abstract get_maze_state() → Any

Return current state of the environment.

:return The same state as returned by reset().

abstract get_renderer()maze.core.rendering.renderer.Renderer

Return renderer instance that can be used to render the env.

:return Renderer instance

abstract get_serializable_components() → Dict[str, Any]

(overrides SerializableEnvMixin)

List components that should be serialized as part of trajectory data.

get_step_events() → Iterable[]

(overrides EventEnvMixin)

Get all events recorded in the current step from the EventService.

:return An iterable of the recorded events.

abstract is_actor_done()bool

(overrides StructuredEnv)

Returns True if the just stepped actor is done, which is different to the done flag of the environment.


True if the actor is done.

abstract reset() → Any

(overrides StructuredEnv)

Reset the environment and return initial state.


The initial state after resetting.

abstract seed(seed: int)None

(overrides StructuredEnv)

Sets the seed for this environment’s random number generator(s).


seed: the seed integer initializing the random number generator.

abstract step(maze_action: Any) → Tuple[Any, Union[float, numpy.ndarray, Any], bool, Dict[Any, Any]]

(overrides StructuredEnv)

Environment step function.

Note: If your core environment is structured, you should call maze.core.env.environment_context.EnvironmentContext.increment_env_step() once the structured step is done, so that the env time is incremented and events/stats cleared.

param maze_action

Environment MazeAction to take.


state, reward, done, info