ESRolloutWorkerWrapper¶
- class maze.train.trainers.es.distributed.es_rollout_wrapper.ESRolloutWorkerWrapper(env: StructuredEnv | LogStatsEnv, shared_noise: SharedNoiseTable, agent_instance_seed: int)¶
The rollout generation is bound to a single worker environment by implementing it as a Wrapper class.
- clear_abort()¶
Clear the abort flag.
- generate_evaluation(policy: Policy | TorchModel) ESRolloutResult¶
Generate a single evaluation rollout.
- Parameters:
policy – Multi-step policy encapsulating the policy networks
:return A result set with a single evaluation rollout
- generate_training(policy: Policy | TorchModel, noise_stddev: float) ESRolloutResult¶
Generate a single training sample, consisting of two rollouts, obtained by adding and subtracting the same random perturbation vector from the policy.
- Parameters:
policy – Multi-step policy encapsulating the policy networks.
noise_stddev – The standard deviation of the applied parameter noise.
- :return A result set with a pair of rollouts generated by adding/subtracting the perturbations
(antithetic sampling)
- rollout(policy: Policy | TorchModel) None¶
Use the passed policy to step the environment until it is done.
This method does not return any results, query the episode statistics instead to process the results.
- Parameters:
policy – Multi-step policy encapsulating the policy networks
- set_abort()¶
Abort the rollout (intended to be called from a thread).