DefaultPolicy

class maze.core.agent.default_policy.DefaultPolicy(policies: List[None | Mapping[str, Any] | Any] | Mapping[str | Type, None | Mapping[str, Any] | Any])

Encapsulates one or more policies identified by policy IDs.

Parameters:

policies – Dict of policy IDs and corresponding policies.

compute_action(observation: Dict[str, numpy.ndarray], maze_state: Any | None = None, env: BaseEnv | None = None, actor_id: ActorID | None = None, deterministic: bool = False) Dict[str, int | numpy.ndarray]

(overrides Policy)

implementation of Policy interface

compute_top_action_candidates(observation: Dict[str, numpy.ndarray], num_candidates: int | None, maze_state: Any | None, env: BaseEnv | None, actor_id: ActorID | None = None) Tuple[Sequence[Dict[str, int | numpy.ndarray]], Sequence[float]]

(overrides Policy)

implementation of Policy interface

needs_state() bool

(overrides Policy)

This policy does not require the state() object to compute the action.

policy_for(actor_id: ActorID | None) FlatPolicy

Return policy corresponding to the given actor ID (or the single available policy if no actor ID is provided)

Parameters:

actor_id – Actor ID to get policy for

Returns:

Flat policy corresponding to the actor ID

seed(seed: int) None

(overrides Policy)

Not applicable since Global seed should already be set before initializing the models