PolicyOutput¶

class maze.core.agent.torch_policy_output.PolicyOutput¶

A structured representation of a policy output over a full (flat) environment step.

property action_logits: List[Dict[str, torch.Tensor]]¶: List of action logits for the individual sub-steps

actor_ids() → List[ActorID]¶: List of actor IDs for the individual sub-steps.

append(value: PolicySubStepOutput)¶: Append a given PolicySubStepOutput.

property embedding_logits: List[Dict[str, torch.Tensor]]¶: List of embedding logits for the individual sub-steps

property entropies: List[torch.Tensor]¶: List of entropies (of the probability distribution of the individual sub-steps.

log_probs_for_actions(actions: List[Dict[str, torch.Tensor]]) → List[Dict[str, torch.Tensor]]¶: Compute the action log probs for given actions. :param actions: The actions to use. :return: The computed action log probabilities.

property prob_dist: List[DictProbabilityDistribution]¶: List of probability dictionaries for the individual sub-steps