DefaultPolicy¶

Encapsulates one or more policies identified by policy IDs.

compute_action(observation: Dict[str, numpy.ndarray], maze_state: Any | None = None, env: BaseEnv | None = None, actor_id: ActorID | None = None, deterministic: bool = False) → Dict[str, int | numpy.ndarray]¶

(overrides Policy)

implementation of Policy interface

(overrides Policy)

implementation of Policy interface

needs_state() → bool¶

(overrides Policy)

This policy does not require the state() object to compute the action.

policy_for(actor_id: ActorID | None) → FlatPolicy¶

Return policy corresponding to the given actor ID (or the single available policy if no actor ID is provided)

seed(seed: int) → None¶

(overrides Policy)

Not applicable since Global seed should already be set before initializing the models