PolicySubStepOutput¶

class maze.core.agent.torch_policy_output.PolicySubStepOutput(action_logits: Dict[str, torch.Tensor], prob_dist: maze.distributions.dict.DictProbabilityDistribution, embedding_logits: Optional[Dict[str, torch.Tensor]], actor_id: maze.core.env.structured_env.ActorID)¶

Dataclass for holding the output of the policy’s compute full output method

action_logits: Dict[str, torch.Tensor]¶: A logits dictionary (action_head maps to action_logits) to parameterize the distribution from.

actor_id: maze.core.env.structured_env.ActorID¶: The actor id of the output

embedding_logits: Optional[Dict[str, torch.Tensor]]¶: The Embedding output if applicable, used as the input for the critic network.

property entropy¶: The entropy of the probability distribution.

prob_dist: maze.distributions.dict.DictProbabilityDistribution¶: The respective instance of a DictProbabilityDistribution.

Read the Docs v: stable

Versions: latest; stable

Downloads: html; epub

On Read the Docs: Project Home; Builds