PolicyOutput¶
- class maze.core.agent.torch_policy_output.PolicyOutput¶
A structured representation of a policy output over a full (flat) environment step.
- property action_logits: List[Dict[str, torch.Tensor]]¶
List of action logits for the individual sub-steps
- append(value: PolicySubStepOutput)¶
Append a given PolicySubStepOutput.
- property embedding_logits: List[Dict[str, torch.Tensor]]¶
List of embedding logits for the individual sub-steps
- property entropies: List[torch.Tensor]¶
List of entropies (of the probability distribution of the individual sub-steps.
- log_probs_for_actions(actions: List[Dict[str, torch.Tensor]]) List[Dict[str, torch.Tensor]]¶
Compute the action log probs for given actions. :param actions: The actions to use. :return: The computed action log probabilities.
- property prob_dist: List[DictProbabilityDistribution]¶
List of probability dictionaries for the individual sub-steps