PolicySubStepOutput¶
-
class
maze.core.agent.torch_policy_output.
PolicySubStepOutput
(action_logits: Dict[str, torch.Tensor], prob_dist: maze.distributions.dict.DictProbabilityDistribution, embedding_logits: Optional[Dict[str, torch.Tensor]], actor_id: maze.core.env.structured_env.ActorID)¶ Dataclass for holding the output of the policy’s compute full output method
-
action_logits
: Dict[str, torch.Tensor]¶ A logits dictionary (action_head maps to action_logits) to parameterize the distribution from.
-
actor_id
: maze.core.env.structured_env.ActorID¶ The actor id of the output
-
embedding_logits
: Optional[Dict[str, torch.Tensor]]¶ The Embedding output if applicable, used as the input for the critic network.
-
property
entropy
¶ The entropy of the probability distribution.
-
prob_dist
: maze.distributions.dict.DictProbabilityDistribution¶ The respective instance of a DictProbabilityDistribution.
-