Policies, Critics and Agents¶
This page contains the reference documentation for policies, critics and agents.
maze.core.agent¶
Policies:
Generic flat policy interface. |
|
Structured policy class designed to work with structured environments. |
|
Encapsulates multiple torch policies along with a distribution mapper for training and rollouts in structured environments. |
|
Dataclass for holding the output of the policy’s compute full output method |
|
A structured representation of a policy output over a full (flat) environment step. |
|
Encapsulates one or more policies identified by policy IDs. |
|
Implements a random structured policy. |
|
Dummy structured policy for the CartPole env. |
|
Structured policy used for rollouts of trained models. |
|
A replay action record policy that executes in each (sub-)step the action stored in the provided action record. |
Critics:
Structured state critic class designed to work with structured environments. |
|
State Critic Step output holds the output of an a critic for an individual env step. |
|
Critic output holds the output of a critic for one full flat env step. |
|
State Critic input for a single substep of the env, holding the tensor_dict and the actor_ids corresponding to where the embedding logits where retrieved if applicable, otherwise just the corresponding actor. |
|
State Critic output defined as it’s own type, since it has to be explicitly build to be compatible with shared embedding networks. |
|
Encapsulates multiple torch state critics for training in structured environments. |
|
One critic is shared across all sub-steps or actors (default to use for standard gym-style environments). |
|
Each sub-step or actor gets its individual critic. |
|
First sub step gets a regular critic, subsequent sub-steps predict a delta w.r.t. |
|
Structured state action critic class designed to work with structured environments. |
|
Encapsulates multiple torch state action critics for training in structured environments. |
|
One critic is shared across all sub-steps or actors (default to use for standard gym-style environments). |
|
Each sub-step or actor gets its individual critic. |
Models:
Base class for any torch model. |
|
Encapsulates a structured torch policy and critic for training actor-critic algorithms in structured environments. |