class maze.core.agent.random_policy.RandomPolicy(action_spaces_dict: Dict[Union[str, int], gym.spaces.Space])

Implements a random structured policy.


action_spaces_dict – The action_spaces dict of the env (will sample from it).

compute_action(observation: Dict[str, numpy.ndarray], maze_state: Optional[Any], env: Optional[maze.core.env.base_env.BaseEnv] = None, actor_id: Optional[maze.core.env.structured_env.ActorID] = None, deterministic: bool = False) → Dict[str, Union[int, numpy.ndarray]]

(overrides Policy)

Sample random action from the given action space.

compute_top_action_candidates(observation: Dict[str, numpy.ndarray], num_candidates: Optional[int], maze_state: Optional[Any], env: Optional[maze.core.env.base_env.BaseEnv], actor_id: maze.core.env.structured_env.ActorID = None) → Tuple[Sequence[Dict[str, Union[int, numpy.ndarray]]], Sequence[float]]

(overrides Policy)

Sample multiple random actions from the provided action space (and assign uniform probabilities

to the sampled actions).


(overrides Policy)

This policy does not require the state() object to compute the action.

seed(seed: int)None

(overrides Policy)

Seed the policy by setting the action space seeds.