PolicyOutput

class maze.core.agent.torch_policy_output.PolicyOutput

A structured representation of a policy output over a full (flat) environment step.

property action_logits: List[Dict[str, torch.Tensor]]

List of action logits for the individual sub-steps

actor_ids() List[ActorID]

List of actor IDs for the individual sub-steps.

append(value: PolicySubStepOutput)

Append a given PolicySubStepOutput.

property embedding_logits: List[Dict[str, torch.Tensor]]

List of embedding logits for the individual sub-steps

property entropies: List[torch.Tensor]

List of entropies (of the probability distribution of the individual sub-steps.

log_probs_for_actions(actions: List[Dict[str, torch.Tensor]]) List[Dict[str, torch.Tensor]]

Compute the action log probs for given actions. :param actions: The actions to use. :return: The computed action log probabilities.

property prob_dist: List[DictProbabilityDistribution]

List of probability dictionaries for the individual sub-steps