ESRolloutWorkerWrapper

class maze.train.trainers.es.distributed.es_rollout_wrapper.ESRolloutWorkerWrapper(env: StructuredEnv | LogStatsEnv, shared_noise: SharedNoiseTable, agent_instance_seed: int)

The rollout generation is bound to a single worker environment by implementing it as a Wrapper class.

clear_abort()

Clear the abort flag.

generate_evaluation(policy: Policy | TorchModel) ESRolloutResult

Generate a single evaluation rollout.

Parameters:

policy – Multi-step policy encapsulating the policy networks

:return A result set with a single evaluation rollout

generate_training(policy: Policy | TorchModel, noise_stddev: float) ESRolloutResult

Generate a single training sample, consisting of two rollouts, obtained by adding and subtracting the same random perturbation vector from the policy.

Parameters:
  • policy – Multi-step policy encapsulating the policy networks.

  • noise_stddev – The standard deviation of the applied parameter noise.

:return A result set with a pair of rollouts generated by adding/subtracting the perturbations

(antithetic sampling)

rollout(policy: Policy | TorchModel) None

Use the passed policy to step the environment until it is done.

This method does not return any results, query the episode statistics instead to process the results.

Parameters:

policy – Multi-step policy encapsulating the policy networks

set_abort()

Abort the rollout (intended to be called from a thread).