FlatPolicy¶

class maze.core.agent.flat_policy.FlatPolicy¶

Generic flat policy interface.

abstract compute_action(observation: Dict[str, numpy.ndarray], deterministic: bool) → Dict[str, int | numpy.ndarray]¶

Pick the next action based on the current observation.

Parameters:

Returns:

Next action to take

abstract compute_top_action_candidates(observation: Dict[str, numpy.ndarray], num_candidates: int | None) → Tuple[Sequence[Dict[str, int | numpy.ndarray]], Sequence[float]]¶

Get the top :num_candidates actions as well as the probabilities, q-values, .. leading to the decision.

Parameters:

Returns:

a tuple of sequences, where the first sequence corresponds to the possible actions, the other sequence to the associated probabilities