ESAlgorithmConfig(n_epochs: int, n_rollouts_per_update: int, n_timesteps_per_update: int, max_steps: int, optimizer: Any, l2_penalty: float, noise_stddev: float)¶
Algorithm parameters for evolution strategies model. Note: Pass 0 to n_epochs to train indefinitely.
Limit the episode rollouts to a maximum number of steps. Set to 0 to disable this option.
Minimum number of cumulative env steps per training iteration (=epoch). The training iteration is only finished, once the given number of episodes AND the given number of steps has been reached. One of the two parameters can be set to 0.
The optimizer to use to update the policy based on the sampled gradient.