Policies, Critics and Agents¶

This page contains the reference documentation for policies, critics and agents.

maze.core.agent¶

Policies:

`FlatPolicy`	Generic flat policy interface.
`Policy`	Structured policy class designed to work with structured environments.
`TorchPolicy`	Encapsulates multiple torch policies along with a distribution mapper for training and rollouts in structured environments.
`PolicySubStepOutput`	Dataclass for holding the output of the policy’s compute full output method
`PolicyOutput`	A structured representation of a policy output over a full (flat) environment step.
`DefaultPolicy`	Encapsulates one or more policies identified by policy IDs.
`RandomPolicy`	Implements a random structured policy.
`DummyCartPolePolicy`	Dummy structured policy for the CartPole env.
`SerializedTorchPolicy`	Structured policy used for rollouts of trained models.
`ReplayRecordedActionsPolicy`	A replay action record policy that executes in each (sub-)step the action stored in the provided action record.

Critics:

`StateCritic`	Structured state critic class designed to work with structured environments.
`StateCriticStepOutput`	State Critic Step output holds the output of an a critic for an individual env step.
`StateCriticOutput`	Critic output holds the output of a critic for one full flat env step.
`StateCriticStepInput`	State Critic input for a single substep of the env, holding the tensor_dict and the actor_ids corresponding to where the embedding logits where retrieved if applicable, otherwise just the corresponding actor.
`StateCriticInput`	State Critic output defined as it’s own type, since it has to be explicitly build to be compatible with shared embedding networks.
`TorchStateCritic`	Encapsulates multiple torch state critics for training in structured environments.
`TorchSharedStateCritic`	One critic is shared across all sub-steps or actors (default to use for standard gym-style environments).
`TorchStepStateCritic`	Each sub-step or actor gets its individual critic.
`TorchDeltaStateCritic`	First sub step gets a regular critic, subsequent sub-steps predict a delta w.r.t.
`StateActionCritic`	Structured state action critic class designed to work with structured environments.
`TorchStateActionCritic`	Encapsulates multiple torch state action critics for training in structured environments.
`TorchSharedStateActionCritic`	One critic is shared across all sub-steps or actors (default to use for standard gym-style environments).
`TorchStepStateActionCritic`	Each sub-step or actor gets its individual critic.

Models:

`TorchModel`	Base class for any torch model.
`TorchActorCritic`	Encapsulates a structured torch policy and critic for training actor-critic algorithms in structured environments.