ESRolloutWorkerWrapper

class maze.train.trainers.es.distributed.es_rollout_wrapper.ESRolloutWorkerWrapper(*args, **kwds)

The rollout generation is bound to a single worker environment by implementing it as a Wrapper class.

clear_abort()

Clear the abort flag.

generate_evaluation(policy: maze.core.agent.torch_policy.TorchPolicy)maze.train.trainers.es.distributed.es_distributed_rollouts.ESRolloutResult

Generate a single evaluation rollout.

Parameters

policy – Multi-step policy encapsulating the policy networks

:return A result set with a single evaluation rollout

generate_training(policy: maze.core.agent.torch_policy.TorchPolicy, noise_stddev: float)maze.train.trainers.es.distributed.es_distributed_rollouts.ESRolloutResult

Generate a single training sample, consisting of two rollouts, obtained by adding and subtracting the same random perturbation vector from the policy.

Parameters
  • policy – Multi-step policy encapsulating the policy networks.

  • noise_stddev – The standard deviation of the applied parameter noise.

:return A result set with a pair of rollouts generated by adding/subtracting the perturbations

(antithetic sampling)

rollout(policy: maze.core.agent.torch_policy.TorchPolicy)None

Use the passed policy to step the environment until it is done.

This method does not return any results, query the episode statistics instead to process the results.

Parameters

policy – Multi-step policy encapsulating the policy networks

set_abort()

Abort the rollout (intended to be called from a thread).