PolicySubStepOutput

class maze.core.agent.torch_policy_output.PolicySubStepOutput(action_logits: Dict[str, torch.Tensor], prob_dist: maze.distributions.dict.DictProbabilityDistribution, embedding_logits: Optional[Dict[str, torch.Tensor]], actor_id: maze.core.env.structured_env.ActorID)

Dataclass for holding the output of the policy’s compute full output method

action_logits: Dict[str, torch.Tensor]

A logits dictionary (action_head maps to action_logits) to parameterize the distribution from.

actor_id: maze.core.env.structured_env.ActorID

The actor id of the output

embedding_logits: Optional[Dict[str, torch.Tensor]]

The Embedding output if applicable, used as the input for the critic network.

property entropy

The entropy of the probability distribution.

prob_dist: maze.distributions.dict.DictProbabilityDistribution

The respective instance of a DictProbabilityDistribution.