ESDistributedRollouts¶

class maze.train.trainers.es.distributed.es_distributed_rollouts.ESDistributedRollouts¶

Abstract base class of ES rollout distribution.

abstract generate_rollouts(policy: Policy | TorchModel, max_steps: int | None, noise_stddev: float, normalization_stats: Dict[str, Tuple[numpy.ndarray, numpy.ndarray]]) → Generator[ESRolloutResult, None, None]¶

Declare a new rollout task and start producing results that can be obtained from the returned generator.

Note that different distribution strategies have different ways of balancing evaluation and training rollouts.

Parameters:

policy – Multi-step policy encapsulating the policy networks
max_steps – Optionally limit the rollout to a number of environment steps (horizon).
noise_stddev – The standard deviation of the applied parameter noise.
normalization_stats – Normalization statistics as calculated by the NormalizeObservationWrapper.