Maze RLlib Runner

The RLlib Runner allows you to use RLlib Trainers in combination with Maze models and environments. Ray-RLlib is one of the most popular RL frameworks (algorithm collections) within the scientific community but also when it comes to practical relevance. It already comprises an extensive and tuned collection of various different RL training algorithms. To gain access to RLlib’s algorithm collection while still having access to all of practical Maze features we introduce the Maze Rllib Module. It basically wraps Maze models (including our extensive Perception Module), Maze environments (including wrappers) as well as the customizable Maze action distributions. It further allows us to use the Maze hydra cmd-line interfaces together with RLlib while at the same time using the well optimized algorithms from RLlib.

This page gives an overview of the RLlib module and provides examples on how to apply it.


List of Features

  • Use Maze environments, models and action distributes in conjunction with RLlib algorithms.

  • Make full use of the Maze environment customization utils (wrappers, pre-processing, …).

  • Use the hydra cmd-line interface to start training runs.

  • Models trained with the Maze RLlib Runner are fully compatible with the remaining framework (except when using the default RLlib models).

Example 1: Training with Maze-RLlib and Hydra

Using RLlib algorithms with Maze and Hydra works analogously to starting training with native Maze Trainers. To train the CartPole environment with RLlib’s PPO, run:

$ maze-run -cn conf_rllib rllib/algorithm=ppo

Here the -cn conf_rllib argument specifies to use the conf_rllib.yaml (available in maze-rllib) package, as our root config file. It specifies the way how to use RLlib trainers within Maze. (For more on root configuration files, see Hydra overview.)

Example 2: Overwriting Training Parameters

Similar to native Maze trainers, the parametrization of RLlib training runs is also done via Hydra. The main parameters for customizing training and are:

  • Environment (env configuration group), configuring which environment the training runs on, this stays the same as in maze-train for example.

  • Algorithm (rllib/algorithm configuration group), specifies the algorithm and its configuration (all supported algorithms).

  • Model (model configuration group), specifying how the models for policies and (optionally) critics should be assembled, this also stays the same as in maze-train.

  • Runner (rllib/runner configuration group), specifies how training is run (e.g. locally, in development mode). The runner is also the main object responsible for administering the whole training run.. The runner is also the main object responsible for administering the whole training run.

To train with a different algorithm we simply have to specify the rllib/algorithm parameter:

$ maze-run -cn conf_rllib rllib/algorithm=a3c

Furthermore, we have full access to the algorithm hyper parameters defined by RLlib and can overwrite them. E.g., to change the learning rate and rollout fragment length, execute

$ maze-run -cn conf_rllib rllib/algorithm=a3c \ algorithm.config.rollout_fragment_length=50

Example 3: Training with RLlib’s Default Models

Finally, it is also possible to utilize the RLlib default model builder by specifying model=rllib. This will load the rllib default model and parameters, which can again be customized via Hydra:

$ maze-run -cn conf_rllib model=rllib \
  model.fcnet_hiddens=[128,128] model.vf_share_layers=False

The Bigger Picture

The figure below shows an overview of how the RLlib Module connects to the different Maze components in more detail:


Good to Know


Using the the argument rllib/runner=dev starts ray in local mode, by default sets the number workers to 1 and increases the log level (resulting in more information being printed). This is especially useful for debugging.


When watching the training progress of RLlib training runs with Tensorboard make sure to start Tensorboard with --reload_multifile true as both Maze and RLlib will dump an event log.

Where to Go Next