RunContext

class maze.api.run_context.RunContext(run_dir: Optional[str] = None, env: Optional[Union[str, Mapping[str, Any], Callable[], maze.core.env.maze_env.MazeEnv]]] = None, wrappers: Optional[Union[str, Mapping[str, Any], maze.core.wrappers.wrapper.Wrapper]] = None, algorithm: Optional[Union[str, Mapping[str, Any], maze.train.trainers.common.config_classes.AlgorithmConfig]] = None, model: Optional[Union[str, Mapping[str, Any], maze.perception.models.model_composer.BaseModelComposer]] = None, policy: Optional[Union[str, Mapping[str, Any], maze.perception.models.policies.base_policy_composer.BasePolicyComposer]] = None, critic: Optional[Union[str, Mapping[str, Any], maze.perception.models.critics.base_state_critic_composer.BaseStateCriticComposer]] = None, launcher: Optional[Union[str, Mapping[str, Any], hydra.plugins.launcher.Launcher]] = None, runner: Optional[Union[str, Mapping[str, Any]]] = None, overrides: Optional[Dict[str, Union[Mapping[str, Any], Any]]] = None, configuration: Optional[str] = None, experiment: Optional[str] = None, multirun: bool = False, silent: bool = False)

RunContext offers convenient access to consistently configured training and rollout capabilities with minimal setup, yet is flexible enough to enable manipulation of every configurable aspect of Maze. It is initialized via an interface largely congruent with Maze’ CLI, but also accepts instantiated Python objects. Internally it wraps a TrainingRunner and RolloutRunner object initiated w.r.t. to the specified configuration.

Note: As of now, only training is supported. Rollout will be added soon.

Parameters
  • run_dir – Directory in which to store training and rollout processes and from which to read artefacts. This is an alias of hydra.run.dir (i.e. “hydra.run.dir”=x in overrides has the same effect as run_dir=x).

  • env – Environment configuration module name, Hydra configuration or callable returning instantiated Maze environment. It might be necessary, depending on the chosen runner and/or trainer configuration, that multiple environments have to be instantiated. env has to be passed as config, path to the config or factory function hence.

  • wrappers – Wrapper configuration module name, Hydra configuration or instance.

  • algorithm – Algorithm configuration module name, Hydra configuration or instance.

  • model – Model configuration module name, Hydra configuration or instance.

  • policy – Policy configuration module name, Hydra configuration or instance. Part of the model, i.e. setting the policy via this argument or via model.policy in the overrides dictionary is equivalent. Beware: When using a TemplateModelComposer the policy is to be specified without networks.

  • critic – Critic configuration module name, Hydra configuration or instance. Part of the model, i.e. setting the policy via this argument or via model.critic in the overrides dictionary is equivalent. Beware: When using a TemplateModelComposer the critic is to be specified without networks.

  • launcher – Launcher configuration module name, Hydra configuration or instance.

  • runner – Runner configuration module name or Hydra configuration. RolloutRunner configuration will be providable once rollouts are fully supported.

  • overrides – Dictionary specifying overrides for individual properties. Overrides might specify values for entire components like environments, some of their attributes or for specializations. Possible values are Hydra configuration dictionaries as well as instantiated objects. Beware that overrides will not load configuration modules and can only override loaded configuration elements (i.e. overriding X.attr will fails if X is not part of the loaded configuration).

  • configuration – Determines which specialization configuration to load. Possible values: “run” or None. Has to be specified via module name exclusively, i.e. configuration=”test”. This affects the following components: Environments, models, algorithms and runners.

  • experiment – Determines which experiment to load. Has to be specified via module name exclusively, i.e. experiment=”x”.

  • multirun – Allows running with multiple configurations (e.g. a grid search).

  • silent – Whether to suppress output to stdout.

compute_action(observation: Dict[str, numpy.ndarray], maze_state: Optional[Any] = None, env: Optional[maze.core.env.base_env.BaseEnv] = None, actor_id: maze.core.env.structured_env.ActorID = None, deterministic: bool = False) → Union[Dict[str, Union[int, numpy.ndarray]], List[Dict[str, Union[int, numpy.ndarray]]]]

Computes action(s) with configured policy/policies. This wraps maze.core.agent.policy.Policy.compute_action().

Returns

Computed action(s) for next step. If run in single mode, list is collapsed to a single action instance.

property config

Returns Hydra DictConfigs specifying the configuration for training and rollout runners.

Returns

Dictionaries with DictConfig(s) for training and rollout each. Note that configurations are initialized lazily, i.e. are not available until first training or rollout are initiated.

property env_factory

Returns a newly generated environment with wrappers applied w.r.t. the specified configuration.

Returns

Environment factory function(s). One factory function if RunContext doesn’t operate in multi-run mode, otherwise a list thereof.

evaluate(**eval_kwargs) → Union[Dict[maze.core.log_stats.log_stats.LogStatsKey, Union[int, float, numpy.ndarray, dict]], List[Dict[maze.core.log_stats.log_stats.LogStatsKey, Union[int, float, numpy.ndarray, dict]]]]

Evaluates the trained/loaded policy with an RolloutEvaluator. By default 8 episodes are evaluated sequentially.

Parameters

eval_kwargs – kwargs to overwrite set (or default) initialization parameters for RolloutEvaluator. Note that these arguments are ignored if RolloutRunner was passed as instance in AlgorithmConfig.

Returns

Logged statistics. One LogStats object if RunContext doesn’t operate in multi-run mode, otherwise a list thereof.

property policy

Returns policy/policies.

Returns

Policy/policies used for training and rollout. If run in single mode, list is collapsed to a single Policy instance.

property run_dir

Returns run directory/directories.

Returns

Run directory/directories. Note that run directory are initialized lazily, i.e. are not available until first training or rollout are initiated. If run in single mode, list is collapsed to a single run directory string.

train(n_epochs: Optional[int] = None, **train_kwargs)None

Trains for specified number of epochs. After training the trainer is reset to the overall best state encountered.

Parameters
  • n_epochs – Number of epochs to train for.

  • train_kwargs – Arguments to pass on to run().