PolicyOutput¶

class maze.core.agent.torch_policy_output.PolicyOutput¶

A structured representation of a policy output over a full (flat) environment step.

property action_logits¶: List of action logits for the individual sub-steps

actor_ids() → List[maze.core.env.structured_env.ActorID]¶: List of actor IDs for the individual sub-steps.

append(value: maze.core.agent.torch_policy_output.PolicySubStepOutput)¶: Append a given PolicySubStepOutput.

property embedding_logits¶: List of embedding logits for the individual sub-steps

property entropies¶: List of entropies (of the probability distribution of the individual sub-steps.

log_probs_for_actions(actions: List[Dict[str, torch.Tensor]]) → List[Dict[str, torch.Tensor]]¶: Compute the action log probs for given actions. :param actions: The actions to use. :return: The computed action log probabilities.

property prob_dist¶: List of probability dictionaries for the individual sub-steps

Read the Docs v: stable

Versions: latest; stable

Downloads: html; epub

On Read the Docs: Project Home; Builds