PolicyOutput

class maze.core.agent.torch_policy_output.PolicyOutput

A structured representation of a policy output over a full (flat) environment step.

property action_logits

List of action logits for the individual sub-steps

actor_ids() → List[maze.core.env.structured_env.ActorID]

List of actor IDs for the individual sub-steps.

append(value: maze.core.agent.torch_policy_output.PolicySubStepOutput)

Append a given PolicySubStepOutput.

property embedding_logits

List of embedding logits for the individual sub-steps

property entropies

List of entropies (of the probability distribution of the individual sub-steps.

log_probs_for_actions(actions: List[Dict[str, torch.Tensor]]) → List[Dict[str, torch.Tensor]]

Compute the action log probs for given actions. :param actions: The actions to use. :return: The computed action log probabilities.

property prob_dist

List of probability dictionaries for the individual sub-steps