PolicyOutput¶
-
class
maze.core.agent.torch_policy_output.
PolicyOutput
¶ A structured representation of a policy output over a full (flat) environment step.
-
property
action_logits
¶ List of action logits for the individual sub-steps
-
actor_ids
() → List[maze.core.env.structured_env.ActorID]¶ List of actor IDs for the individual sub-steps.
-
append
(value: maze.core.agent.torch_policy_output.PolicySubStepOutput)¶ Append a given PolicySubStepOutput.
-
property
embedding_logits
¶ List of embedding logits for the individual sub-steps
-
property
entropies
¶ List of entropies (of the probability distribution of the individual sub-steps.
-
log_probs_for_actions
(actions: List[Dict[str, torch.Tensor]]) → List[Dict[str, torch.Tensor]]¶ Compute the action log probs for given actions. :param actions: The actions to use. :return: The computed action log probabilities.
-
property
prob_dist
¶ List of probability dictionaries for the individual sub-steps
-
property