ESRolloutWorkerWrapper¶

class maze.train.trainers.es.distributed.es_rollout_wrapper.ESRolloutWorkerWrapper(*args, **kwds)¶

The rollout generation is bound to a single worker environment by implementing it as a Wrapper class.

clear_abort()¶: Clear the abort flag.

generate_evaluation(policy: Union[maze.core.agent.policy.Policy, maze.core.agent.torch_model.TorchModel]) → maze.train.trainers.es.distributed.es_distributed_rollouts.ESRolloutResult ¶

Generate a single evaluation rollout.

Parameters: policy – Multi-step policy encapsulating the policy networks

:return A result set with a single evaluation rollout

generate_training(policy: Union[maze.core.agent.policy.Policy, maze.core.agent.torch_model.TorchModel], noise_stddev: float) → maze.train.trainers.es.distributed.es_distributed_rollouts.ESRolloutResult ¶

Generate a single training sample, consisting of two rollouts, obtained by adding and subtracting the same random perturbation vector from the policy.

Parameters

policy – Multi-step policy encapsulating the policy networks.
noise_stddev – The standard deviation of the applied parameter noise.

:return A result set with a pair of rollouts generated by adding/subtracting the perturbations: (antithetic sampling)

rollout(policy: Union[maze.core.agent.policy.Policy, maze.core.agent.torch_model.TorchModel]) → None ¶

Use the passed policy to step the environment until it is done.

This method does not return any results, query the episode statistics instead to process the results.

Parameters: policy – Multi-step policy encapsulating the policy networks

set_abort()¶: Abort the rollout (intended to be called from a thread).