ESRolloutWorkerWrapper¶
-
class
maze.train.trainers.es.distributed.es_rollout_wrapper.
ESRolloutWorkerWrapper
(*args, **kwds)¶ The rollout generation is bound to a single worker environment by implementing it as a Wrapper class.
-
clear_abort
()¶ Clear the abort flag.
-
generate_evaluation
(policy: Union[maze.core.agent.policy.Policy, maze.core.agent.torch_model.TorchModel]) → maze.train.trainers.es.distributed.es_distributed_rollouts.ESRolloutResult¶ Generate a single evaluation rollout.
- Parameters
policy – Multi-step policy encapsulating the policy networks
:return A result set with a single evaluation rollout
-
generate_training
(policy: Union[maze.core.agent.policy.Policy, maze.core.agent.torch_model.TorchModel], noise_stddev: float) → maze.train.trainers.es.distributed.es_distributed_rollouts.ESRolloutResult¶ Generate a single training sample, consisting of two rollouts, obtained by adding and subtracting the same random perturbation vector from the policy.
- Parameters
policy – Multi-step policy encapsulating the policy networks.
noise_stddev – The standard deviation of the applied parameter noise.
- :return A result set with a pair of rollouts generated by adding/subtracting the perturbations
(antithetic sampling)
-
rollout
(policy: Union[maze.core.agent.policy.Policy, maze.core.agent.torch_model.TorchModel]) → None¶ Use the passed policy to step the environment until it is done.
This method does not return any results, query the episode statistics instead to process the results.
- Parameters
policy – Multi-step policy encapsulating the policy networks
-
set_abort
()¶ Abort the rollout (intended to be called from a thread).
-