RolloutGenerator(env: Union[maze.core.env.maze_env.MazeEnv, maze.train.parallelization.vector_env.structured_vector_env.StructuredVectorEnv], record_logits: bool = False, record_step_stats: bool = False, record_episode_stats: bool = False, record_next_observations: bool = False, terminate_on_done: bool = False)¶
Rollouts a given policy in a given environment, recording the trajectory (in the form of raw actions and observations).
Works with both standard and vectorized environments.
env – Environment to run rollouts in. Will be reset before the first rollout.
record_logits – Whether to record the policy logits.
record_step_stats – Whether to record step statistics.
record_episode_stats – Whether to record episode stats (happens only when an episode is done).
record_next_observations – Whether to record next observation (i.e. observation following the action taken).
terminate_on_done – Whether to end the rollout when the env is done (by default resets the env and continues until the desired number of steps has been recorded). Only applicable in non-vectorized scenarios.
get_epoch_stats_aggregator() → maze.core.log_stats.log_stats.LogStatsAggregator¶
Return the collected epoch stats aggregator
get_stats_value(event: Callable, level: maze.core.log_stats.log_stats.LogStatsLevel, name: Optional[str] = None) → Union[int, float, numpy.ndarray, dict]¶
Obtain a single value from the epoch statistics dict.
event – The event interface method of the value in question.
name – The output_name of the statistics in case it has been specified in
level – Must be set to LogStatsLevel.EPOCH, step or episode statistics are not propagated.
rollout(policy: maze.core.agent.policy.Policy, n_steps: Optional[int], trajectory_id: Optional[Any] = None) → maze.core.trajectory_recording.records.trajectory_record.SpacesTrajectoryRecord¶
Perform and record a rollout with given policy, for given steps or until done.
Note that the env is only reset on the very first rollout with this generator, the following rollouts just pick up where the previous left off. If required, you can avoid the initial reset by assigning the last observation (which will be recorded with the first step) into self.last_observation.
policy – Policy to roll out.
n_steps – How many steps to perform. If None, rollouts are performed until done=True.
trajectory_id – Optionally, the ID of the trajectory that we are recording.