Policies, Critics and Agents

This page contains the reference documentation for policies, critics and agents.

maze.core.agent

Policies:

FlatPolicy

Generic flat policy interface.

Policy

Structured policy class designed to work with structured environments.

TorchPolicy

Encapsulates multiple torch policies along with a distribution mapper for training and rollouts in structured environments.

PolicySubStepOutput

Dataclass for holding the output of the policy’s compute full output method

PolicyOutput

A structured representation of a policy output over a full (flat) environment step.

DefaultPolicy

Encapsulates one or more policies identified by policy IDs.

RandomPolicy

Implements a random structured policy.

DummyCartPolePolicy

Dummy structured policy for the CartPole env.

SerializedTorchPolicy

Structured policy used for rollouts of trained models.

ReplayRecordedActionsPolicy

A replay action record policy that executes in each (sub-)step the action stored in the provided action record.

Critics:

StateCritic

Structured state critic class designed to work with structured environments.

StateCriticStepOutput

State Critic Step output holds the output of an a critic for an individual env step.

StateCriticOutput

Critic output holds the output of a critic for one full flat env step.

StateCriticStepInput

State Critic input for a single substep of the env, holding the tensor_dict and the actor_ids corresponding to where the embedding logits where retrieved if applicable, otherwise just the corresponding actor.

StateCriticInput

State Critic output defined as it’s own type, since it has to be explicitly build to be compatible with shared embedding networks.

TorchStateCritic

Encapsulates multiple torch state critics for training in structured environments.

TorchSharedStateCritic

One critic is shared across all sub-steps or actors (default to use for standard gym-style environments).

TorchStepStateCritic

Each sub-step or actor gets its individual critic.

TorchDeltaStateCritic

First sub step gets a regular critic, subsequent sub-steps predict a delta w.r.t.

StateActionCritic

Structured state action critic class designed to work with structured environments.

TorchStateActionCritic

Encapsulates multiple torch state action critics for training in structured environments.

TorchSharedStateActionCritic

One critic is shared across all sub-steps or actors (default to use for standard gym-style environments).

TorchStepStateActionCritic

Each sub-step or actor gets its individual critic.

Models:

TorchModel

Base class for any torch model.

TorchActorCritic

Encapsulates a structured torch policy and critic for training actor-critic algorithms in structured environments.