Maze RLlib Runner¶
The RLlib Runner allows you to use RLlib Trainers in combination with Maze models and environments. Ray-RLlib is one of the most popular RL frameworks (algorithm collections) within the scientific community but also when it comes to practical relevance. It already comprises an extensive and tuned collection of various different RL training algorithms. To gain access to RLlib’s algorithm collection while still having access to all of practical Maze features we introduce the Maze Rllib Module. It basically wraps Maze models (including our extensive Perception Module), Maze environments (including wrappers) as well as the customizable Maze action distributions. It further allows us to use the Maze hydra cmd-line interfaces together with RLlib while at the same time using the well optimized algorithms from RLlib.
This page gives an overview of the RLlib module and provides examples on how to apply it.
List of Features¶
Use Maze environments, models and action distributes in conjunction with RLlib algorithms.
Make full use of the Maze environment customization utils (wrappers, pre-processing, …).
Use the hydra cmd-line interface to start training runs.
Models trained with the Maze RLlib Runner are fully compatible with the remaining framework (except when using the default RLlib models).
Example 1: Training with Maze-RLlib and Hydra¶
$ maze-run -cn conf_rllib env.name=CartPole-v0 rllib/algorithm=ppo
-cn conf_rllib argument specifies to use the
maze-rllib) package, as our root config file.
It specifies the way how to use RLlib trainers within Maze.
(For more on root configuration files, see Hydra overview.)
Example 2: Overwriting Training Parameters¶
Similar to native Maze trainers, the parametrization of RLlib training runs is also done via Hydra. The main parameters for customizing training and are:
envconfiguration group), configuring which environment the training runs on, this stays the same as in maze-train for example.
rllib/algorithmconfiguration group), specifies the algorithm and its configuration (all supported algorithms).
modelconfiguration group), specifying how the models for policies and (optionally) critics should be assembled, this also stays the same as in maze-train.
rllib/runnerconfiguration group), specifies how training is run (e.g. locally, in development mode). The runner is also the main object responsible for administering the whole training run.. The runner is also the main object responsible for administering the whole training run.
To train with a different algorithm we simply have to specify the
$ maze-run -cn conf_rllib env.name=CartPole-v0 rllib/algorithm=a3c
Furthermore, we have full access to the algorithm hyper parameters defined by RLlib and can overwrite them. E.g., to change the learning rate and rollout fragment length, execute
$ maze-run -cn conf_rllib env.name=CartPole-v0 rllib/algorithm=a3c \ algorithm.config.lr=0.001 algorithm.config.rollout_fragment_length=50
Example 3: Training with RLlib’s Default Models¶
Finally, it is also possible to utilize the RLlib default model builder by specifying
This will load the rllib default model and parameters, which can again be customized via Hydra:
$ maze-run -cn conf_rllib env.name=CartPole-v0 model=rllib \ model.fcnet_hiddens=[128,128] model.vf_share_layers=False
The Bigger Picture¶
The figure below shows an overview of how the RLlib Module connects to the different Maze components in more detail:
Good to Know¶
Using the the argument
rllib/runner=dev starts ray in local mode, by default sets the number workers to 1
and increases the log level (resulting in more information being printed). This is especially useful for debugging.
When watching the training progress of RLlib training runs with Tensorboard
make sure to start Tensorboard with
--reload_multifile true as both Maze and RLlib will dump an event log.
Where to Go Next¶
After training, you might want to rollout the trained policy to further evaluate it or record the actions taken.
To build and use custom Maze models please refer to Maze Perception Module.
For more details on Hydra and how to use it go to configuration with Hydra.
You can read up on our general introduction to the Maze training workflow.