class maze.train.trainers.common.evaluators.rollout_evaluator.RolloutEvaluator(eval_env: maze.train.parallelization.vector_env.structured_vector_env.StructuredVectorEnv, n_episodes: int, model_selection: Optional[maze.train.trainers.common.model_selection.model_selection_base.ModelSelectionBase], deterministic: bool = False)

Evaluates a given policy by rolling it out and collecting the mean reward.

  • eval_env – Distributed environment to run evaluation rollouts in.

  • n_episodes – Number of evaluation episodes to run. Note that the actual number might be slightly larger due to the distributed nature of the environment.

  • model_selection – Model selection to notify about the recorded rewards.

  • deterministic – deterministic or stochastic action sampling (selection).

evaluate(policy: maze.core.agent.torch_policy.TorchPolicy)None

(overrides Evaluator)

Evaluate given policy (results are stored in stat logs) and dump the model if the reward improved.

param policy

Policy to evaluate