class maze.core.rollout.rollout_runner.RolloutRunner(n_episodes: int, max_episode_steps: int, deterministic: bool, record_trajectory: bool, record_event_logs: bool)

General abstract class for rollout runners.

Offers general structure, plus a couple of helper methods for env instantiation and performing the rollout.

  • n_episodes – Count of episodes to run. If explicit seeds are given the actual number of episodes is given by min(n_episodes, n_seeds).

  • max_episode_steps – Count of steps to run in each episode (if environment returns done, the episode will be finished earlier though).

  • deterministic – Deterministic or stochastic action sampling.

  • record_trajectory – Whether to record trajectory data.

  • record_event_logs – Whether to record event logs.

static init_env_and_agent(env_config: omegaconf.DictConfig, wrappers_config: Union[List[Union[None, Mapping[str, Any], Any]], Mapping[Union[str, Type], Union[None, Mapping[str, Any], Any]]], max_episode_steps: int, agent_config: omegaconf.DictConfig, input_dir: str) -> (<class 'maze.core.env.maze_env.MazeEnv'>, <class 'maze.core.agent.policy.Policy'>)

Build the environment (including wrappers) and agent according to given configuration.

  • env_config – Environment config.

  • wrappers_config – Wrapper config.

  • max_episode_steps – Max number of steps per episode to limit the env for.

  • agent_config – Policies config.

  • input_dir – Directory to load the model from.


Tuple of (instantiated environment, instantiated agent).


(overrides Runner)

Parse the supplied Hydra config and perform the run.

static run_episode(env: maze.core.env.structured_env.StructuredEnv, obs: Dict[str, numpy.ndarray], agent: maze.core.agent.policy.Policy, deterministic: bool, render: bool)None

Helper function for running a single episode.

  • env – Environment to run.

  • obs – Initial observation, as returned by reset().

  • deterministic – Argmax policy.

  • agent – Agent to use.

  • render – Whether to render the environment after every step.

classmethod run_interaction_loop(env: maze.core.env.structured_env.StructuredEnv, agent: maze.core.agent.policy.Policy, n_episodes: int, env_seeds: List[Any], agent_seeds: List[Any], deterministic: bool, render: bool = False, after_reset_callback: Callable = None)None

Helper function for running the agent-environment interaction loop for specified number of steps and episodes.

  • env – Environment to run.

  • agent – Agent to use.

  • n_episodes – Count of episodes to perform.

  • env_seeds – The env seeds to be used for each episode.

  • agent_seeds – The agent seeds to be used for each episode.

  • render – Whether to render the environment after every step.

  • after_reset_callback – If supplied, this will be executed after each episode to notify the observer.

abstract run_with(env: Union[None, Mapping[str, Any], Any], wrappers: Union[List[Union[None, Mapping[str, Any], Any]], Mapping[Union[str, Type], Union[None, Mapping[str, Any], Any]]], agent: Union[None, Mapping[str, Any], Any])None

Run the rollout with the given env, wrappers and agent configuration. A helper method to make rollouts easily runnable also directly from python, without building the hydra config object.

Note that this method is designed to run only once – if you call it from python directly (and not using Hydra from command line as is the main use case), you should respect this. Otherwise, you might get weird behavior especially from the statistics and events logging system, as the rollout runners register their own stats and event writers (so you might get duplicate stats) and order of operations sometimes matters (especially with parallel rollouts, where we do not want to carry the writers into child processes).

  • env – Env config or object.

  • wrappers – Wrappers config (see WrapperFactory).

  • agent – Agent config or object.

setup(cfg: omegaconf.DictConfig)None

(overrides Runner)

Sets up prerequisites to rollouts. :param cfg: DictConfig defining components to initialize.