Maze RLlib Runner¶
The RLlib Runner allows you to use RLlib Trainers in combination with Maze models and environments. Ray-RLlib is one of the most popular RL frameworks (algorithm collections) within the scientific community but also when it comes to practical relevance. It already comprises an extensive and tuned collection of various different RL training algorithms. To gain access to RLlib’s algorithm collection while still having access to all of practical Maze features we introduce the Maze Rllib Module. It basically wraps Maze models (including our extensive Perception Module), Maze environments (including wrappers) as well as the customizable Maze action distributions. It further allows us to use the Maze hydra cmd-line interfaces together with RLlib while at the same time using the well optimized algorithms from RLlib.
This page gives an overview of the RLlib module and provides examples on how to apply it.
List of Features¶
Use Maze environments, models and action distributes in conjunction with RLlib algorithms.
Make full use of the Maze environment customization utils (wrappers, pre-processing, …).
Use the hydra cmd-line interface to start training runs.
Models trained with the Maze RLlib Runner are fully compatible with the remaining framework (except when using the default RLlib models).
Example 1: Training with Maze-RLlib and Hydra¶
Using RLlib algorithms with Maze and Hydra works analogously to starting training with native Maze Trainers. To train the CartPole environment with RLlib’s PPO, run:
$ maze-run -cn conf_rllib env.name=CartPole-v0 rllib/algorithm=ppo
Here the -cn conf_rllib
argument specifies to use the conf_rllib.yaml
(available in maze-rllib
) package, as our root config file.
It specifies the way how to use RLlib trainers within Maze.
(For more on root configuration files, see Hydra overview.)
Example 2: Overwriting Training Parameters¶
Similar to native Maze trainers, the parametrization of RLlib training runs is also done via Hydra. The main parameters for customizing training and are:
Environment (
env
configuration group), configuring which environment the training runs on, this stays the same as in maze-train for example.Algorithm (
rllib/algorithm
configuration group), specifies the algorithm and its configuration (all supported algorithms).Model (
model
configuration group), specifying how the models for policies and (optionally) critics should be assembled, this also stays the same as in maze-train.Runner (
rllib/runner
configuration group), specifies how training is run (e.g. locally, in development mode). The runner is also the main object responsible for administering the whole training run.. The runner is also the main object responsible for administering the whole training run.
To train with a different algorithm we simply have to specify the rllib/algorithm
parameter:
$ maze-run -cn conf_rllib env.name=CartPole-v0 rllib/algorithm=a3c
Furthermore, we have full access to the algorithm hyper parameters defined by RLlib and can overwrite them. E.g., to change the learning rate and rollout fragment length, execute
$ maze-run -cn conf_rllib env.name=CartPole-v0 rllib/algorithm=a3c \
algorithm.config.lr=0.001 algorithm.config.rollout_fragment_length=50
Example 3: Training with RLlib’s Default Models¶
Finally, it is also possible to utilize the RLlib default model builder by specifying model=rllib
.
This will load the rllib default model and parameters, which can again be customized via Hydra:
$ maze-run -cn conf_rllib env.name=CartPole-v0 model=rllib \
model.fcnet_hiddens=[128,128] model.vf_share_layers=False
Supported Algorithms¶
The Bigger Picture¶
The figure below shows an overview of how the RLlib Module connects to the different Maze components in more detail:
Good to Know¶
Tip
Using the the argument rllib/runner=dev
starts ray in local mode, by default sets the number workers to 1
and increases the log level (resulting in more information being printed). This is especially useful for debugging.
Tip
When watching the training progress of RLlib training runs with Tensorboard
make sure to start Tensorboard with --reload_multifile true
as both Maze and RLlib will dump an event log.
Where to Go Next¶
After training, you might want to rollout the trained policy to further evaluate it or record the actions taken.
To create a custom Maze environment, you might want to review Maze environment hierarchy and creating a Maze environment from scratch.
To build and use custom Maze models please refer to Maze Perception Module.
For more details on Hydra and how to use it go to configuration with Hydra.
You can read up on our general introduction to the Maze training workflow.