RandomPolicy¶

class maze.core.agent.random_policy.RandomPolicy(action_spaces_dict: Dict[Union[str, int], gym.spaces.Space])¶

Implements a random structured policy.

Parameters: action_spaces_dict – The action_spaces dict of the env (will sample from it).

compute_action(observation: Dict[str, numpy.ndarray], maze_state: Optional[Any], env: Optional[maze.core.env.base_env.BaseEnv] = None, actor_id: Optional[maze.core.env.structured_env.ActorID] = None, deterministic: bool = False) → Dict[str, Union[int, numpy.ndarray]]¶

(overrides Policy)

Sample random action from the given action space.

compute_top_action_candidates(observation: Dict[str, numpy.ndarray], num_candidates: Optional[int], maze_state: Optional[Any], env: Optional[maze.core.env.base_env.BaseEnv], actor_id: maze.core.env.structured_env.ActorID = None) → Tuple[Sequence[Dict[str, Union[int, numpy.ndarray]]], Sequence[float]]¶

(overrides Policy)

Sample multiple random actions from the provided action space (and assign uniform probabilities: to the sampled actions).

needs_state() → bool ¶

(overrides Policy)

This policy does not require the state() object to compute the action.

seed(seed: int) → None ¶

(overrides Policy)

Seed the policy by setting the action space seeds.