Abstract base class of ES rollout distribution.

abstract generate_rollouts(policy: maze.core.agent.torch_policy.TorchPolicy, max_steps: Optional[int], noise_stddev: float, normalization_stats: Dict[str, Tuple[numpy.ndarray, numpy.ndarray]]) → Generator[, None, None]

Declare a new rollout task and start producing results that can be obtained from the returned generator.

Note that different distribution strategies have different ways of balancing evaluation and training rollouts.

  • policy – Multi-step policy encapsulating the policy networks

  • max_steps – Optionally limit the rollout to a number of environment steps (horizon).

  • noise_stddev – The standard deviation of the applied parameter noise.

  • normalization_stats – Normalization statistics as calculated by the NormalizeObservationWrapper.