PolicySubStepOutput¶
- class maze.core.agent.torch_policy_output.PolicySubStepOutput(action_logits: Dict[str, torch.Tensor], prob_dist: DictProbabilityDistribution, embedding_logits: Dict[str, torch.Tensor] | None, actor_id: ActorID)¶
Dataclass for holding the output of the policy’s compute full output method
- action_logits: Dict[str, torch.Tensor]¶
A logits dictionary (action_head maps to action_logits) to parameterize the distribution from.
- embedding_logits: Dict[str, torch.Tensor] | None¶
The Embedding output if applicable, used as the input for the critic network.
- property entropy: torch.Tensor¶
The entropy of the probability distribution.
- prob_dist: DictProbabilityDistribution¶
The respective instance of a DictProbabilityDistribution.