FlatPolicy¶
-
class
maze.core.agent.flat_policy.
FlatPolicy
¶ Generic flat policy interface.
-
abstract
compute_action
(observation: Dict[str, numpy.ndarray], deterministic: bool) → Dict[str, Union[int, numpy.ndarray]]¶ Pick the next action based on the current observation.
- Parameters
observation – Current observation of the environment
deterministic – Specify if the action should be computed deterministically
- Returns
Next action to take
-
abstract
compute_top_action_candidates
(observation: Dict[str, numpy.ndarray], num_candidates: Optional[int]) → Tuple[Sequence[Dict[str, Union[int, numpy.ndarray]]], Sequence[float]]¶ Get the top :num_candidates actions as well as the probabilities, q-values, .. leading to the decision.
- Parameters
observation – Current observation of the environment
num_candidates – The number of actions that should be returned
- Returns
a tuple of sequences, where the first sequence corresponds to the possible actions, the other sequence to the associated probabilities
-
abstract