ESAlgorithmConfig¶

class maze.train.trainers.es.es_algorithm_config.ESAlgorithmConfig(n_epochs: int, n_rollouts_per_update: int, n_timesteps_per_update: int, max_steps: int, optimizer: Any, l2_penalty: float, noise_stddev: float, policy_wrapper: Policy | None)¶

Algorithm parameters for evolution strategies model. Note: Pass 0 to n_epochs to train indefinitely.

l2_penalty: float¶: L2 weight regularization coefficient.

max_steps: int¶: Limit the episode rollouts to a maximum number of steps. Set to 0 to disable this option.

n_rollouts_per_update: int¶: Minimum number of episode rollouts per training iteration (=epoch).

n_timesteps_per_update: int¶: Minimum number of cumulative env steps per training iteration (=epoch). The training iteration is only finished, once the given number of episodes AND the given number of steps has been reached. One of the two parameters can be set to 0.

noise_stddev: float¶: The scaling factor of the random noise applied during training.

optimizer: Any¶: The optimizer to use to update the policy based on the sampled gradient.

policy_wrapper: Policy | None¶: Support for simulation logic or heuristics on top of a TorchPolicy.