RolloutEvaluator¶

class maze.train.trainers.common.evaluators.rollout_evaluator.RolloutEvaluator(eval_env: StructuredVectorEnv, n_episodes: int, model_selection: ModelSelectionBase | None, deterministic: bool = False)¶

Evaluates a given policy by rolling it out and collecting the mean reward.

Parameters:

eval_env – Distributed environment to run evaluation rollouts in.
n_episodes – Number of evaluation episodes to run. Note that the actual number might be slightly larger due to the distributed nature of the environment.
model_selection – Model selection to notify about the recorded rewards.
deterministic – deterministic or stochastic action sampling (selection).

evaluate(policy: TorchPolicy) → None¶

(overrides Evaluator)

Evaluate given policy (results are stored in stat logs) and dump the model if the reward improved.

param policy:

Policy to evaluate