ESRolloutWorkerWrapper¶

class maze.train.trainers.es.distributed.es_rollout_wrapper.ESRolloutWorkerWrapper(env: StructuredEnv | LogStatsEnv, shared_noise: SharedNoiseTable, agent_instance_seed: int)¶

The rollout generation is bound to a single worker environment by implementing it as a Wrapper class.

clear_abort()¶: Clear the abort flag.

generate_evaluation(policy: Policy | TorchModel) → ESRolloutResult¶

Generate a single evaluation rollout.

Parameters:: policy – Multi-step policy encapsulating the policy networks

:return A result set with a single evaluation rollout

generate_training(policy: Policy | TorchModel, noise_stddev: float) → ESRolloutResult¶

Generate a single training sample, consisting of two rollouts, obtained by adding and subtracting the same random perturbation vector from the policy.

Parameters:

policy – Multi-step policy encapsulating the policy networks.
noise_stddev – The standard deviation of the applied parameter noise.

:return A result set with a pair of rollouts generated by adding/subtracting the perturbations: (antithetic sampling)

rollout(policy: Policy | TorchModel) → None¶

Use the passed policy to step the environment until it is done.

This method does not return any results, query the episode statistics instead to process the results.

Parameters:: policy – Multi-step policy encapsulating the policy networks

set_abort()¶: Abort the rollout (intended to be called from a thread).