ESAlgorithmConfig¶
-
class
maze.train.trainers.es.es_algorithm_config.
ESAlgorithmConfig
(n_epochs: int, n_rollouts_per_update: int, n_timesteps_per_update: int, max_steps: int, optimizer: Any, l2_penalty: float, noise_stddev: float, policy_wrapper: Optional[maze.core.agent.policy.Policy])¶ Algorithm parameters for evolution strategies model. Note: Pass 0 to n_epochs to train indefinitely.
-
max_steps
: int¶ Limit the episode rollouts to a maximum number of steps. Set to 0 to disable this option.
-
n_timesteps_per_update
: int¶ Minimum number of cumulative env steps per training iteration (=epoch). The training iteration is only finished, once the given number of episodes AND the given number of steps has been reached. One of the two parameters can be set to 0.
-
optimizer
: Any¶ The optimizer to use to update the policy based on the sampled gradient.
-
policy_wrapper
: Optional[maze.core.agent.policy.Policy]¶ Support for simulation logic or heuristics on top of a TorchPolicy.
-