A2CAlgorithmConfig

class maze.train.trainers.a2c.a2c_algorithm_config.A2CAlgorithmConfig(n_epochs: int, epoch_length: int, patience: int, critic_burn_in_epochs: int, n_rollout_steps: int, lr: float, gamma: float, gae_lambda: float, policy_loss_coef: float, value_loss_coef: float, entropy_coef: float, max_grad_norm: float, device: str, rollout_evaluator: maze.train.trainers.common.evaluators.rollout_evaluator.RolloutEvaluator)

Algorithm parameters for multi-step A2C model.

critic_burn_in_epochs: int

Number of critic (value function) burn in epochs

device: str

Either “cpu” or “cuda”

entropy_coef: float

weight of entropy loss

epoch_length: int

number of updates per epoch

gae_lambda: float

bias vs variance trade of factor for GAE

gamma: float

discounting factor

lr: float

learning rate

max_grad_norm: float

The maximum allowed gradient norm during training

n_epochs: int

number of epochs to train

n_rollout_steps: int

Number of steps taken for each rollout

patience: int

number of steps used for early stopping

policy_loss_coef: float

weight of policy loss

rollout_evaluator: maze.train.trainers.common.evaluators.rollout_evaluator.RolloutEvaluator

Rollout evaluator.

value_loss_coef: float

weight of value loss