class, policy: maze.core.agent.torch_policy.TorchPolicy, shared_noise:, normalization_stats: Optional[Dict[str, Tuple[numpy.ndarray, numpy.ndarray]]])

Trainer class for OpenAI Evolution Strategies.

  • algorithm_config – Algorithm parameters.

  • policy – Multi-step policy encapsulating the policy networks

  • shared_noise – The noise table, with the same content for every worker and the master.

  • normalization_stats – Normalization statistics as calculated by the NormalizeObservationWrapper.

load_state(file_path: Union[str, BinaryIO])None

(overrides Trainer)

implementation of Trainer

load_state_dict(state_dict: Dict)None

Set the model and optimizer state. :param state_dict: The state dict.

train(distributed_rollouts:, n_epochs: Optional[int] = None, model_selection: Optional[maze.train.trainers.common.model_selection.model_selection_base.ModelSelectionBase] = None)None

(overrides Trainer)

Run the ES training loop. :param distributed_rollouts: The distribution interface for experience collection. :param n_epochs: Number of epochs to train. :param model_selection: Optional model selection class, receives model evaluation results.