class maze.core.env.structured_env.StructuredEnv, n_eval_rollouts: int, shared_noise:, agent_instance_seed: int)

Implementation of the ES distribution by running the rollouts synchronously in the same process.

generate_rollouts(policy: maze.core.agent.torch_policy.TorchPolicy, max_steps: Optional[int], noise_stddev: float, normalization_stats: Dict[str, Dict[str, Union[numpy.ndarray, float, int, Iterable[Union[float, int]]]]]) → Generator[, None, None]

(overrides ESDistributedRollouts)

First execute a fixed number of eval rollouts and then continue with producing training samples.