ESAlgorithmConfig

class maze.train.trainers.es.es_algorithm_config.ESAlgorithmConfig(n_epochs: int, n_rollouts_per_update: int, n_timesteps_per_update: int, max_steps: int, optimizer: Any, l2_penalty: float, noise_stddev: float, policy_wrapper: Optional[maze.core.agent.policy.Policy])

Algorithm parameters for evolution strategies model. Note: Pass 0 to n_epochs to train indefinitely.

l2_penalty: float

L2 weight regularization coefficient.

max_steps: int

Limit the episode rollouts to a maximum number of steps. Set to 0 to disable this option.

n_rollouts_per_update: int

Minimum number of episode rollouts per training iteration (=epoch).

n_timesteps_per_update: int

Minimum number of cumulative env steps per training iteration (=epoch). The training iteration is only finished, once the given number of episodes AND the given number of steps has been reached. One of the two parameters can be set to 0.

noise_stddev: float

The scaling factor of the random noise applied during training.

optimizer: Any

The optimizer to use to update the policy based on the sampled gradient.

policy_wrapper: Optional[maze.core.agent.policy.Policy]

Support for simulation logic or heuristics on top of a TorchPolicy.