DefaultPolicy¶
- class maze.core.agent.default_policy.DefaultPolicy(policies: List[None | Mapping[str, Any] | Any] | Mapping[str | Type, None | Mapping[str, Any] | Any])¶
Encapsulates one or more policies identified by policy IDs.
- Parameters:
policies – Dict of policy IDs and corresponding policies.
- compute_action(observation: Dict[str, numpy.ndarray], maze_state: Any | None = None, env: BaseEnv | None = None, actor_id: ActorID | None = None, deterministic: bool = False) Dict[str, int | numpy.ndarray]¶
(overrides
Policy)implementation of
Policyinterface
- compute_top_action_candidates(observation: Dict[str, numpy.ndarray], num_candidates: int | None, maze_state: Any | None, env: BaseEnv | None, actor_id: ActorID | None = None) Tuple[Sequence[Dict[str, int | numpy.ndarray]], Sequence[float]]¶
(overrides
Policy)implementation of
Policyinterface
- needs_state() bool¶
(overrides
Policy)This policy does not require the state() object to compute the action.
- policy_for(actor_id: ActorID | None) FlatPolicy¶
Return policy corresponding to the given actor ID (or the single available policy if no actor ID is provided)
- Parameters:
actor_id – Actor ID to get policy for
- Returns:
Flat policy corresponding to the actor ID