5. Training the MazeEnv¶
The complete code for this part of the tutorial can be found here
# file structure - cutting_2d - main.py - env ... - space_interfaces ... - conf - env - tutorial_cutting_2d_basic.yaml # new - model - tutorial_cutting_2d_basic.yaml # new - wrappers - tutorial_cutting_2d_basic.yaml # new
Hydra only accepts .yaml as file extension.
The entire Maze workflow is boosted by the Hydra configuration system. To be able to perform our first training run via the Maze CLI we have to add a few more config files. Going into the very details of the config structure is for now beyond the scope of this tutorial. However, we still provide some information on the parts relevant for this example.
The config file for the
maze_env_factory looks as follows:
# @package env _target_: tutorial_maze_env.part03_maze_env.env.maze_env.maze_env_factory # parametrizes the core environment max_pieces_in_inventory: 200 raw_piece_size: [100, 100] static_demand: [30, 15]
Additionally, we also provide a wrapper config but refer to Customizing Environments with Wrappers for details.
# @package wrappers # limits the maximum number of time steps of an episode maze.core.wrappers.time_limit_wrapper.TimeLimitWrapper: max_episode_steps: 200 # flattens the dictionary observations to work with DenseLayers maze.core.wrappers.observation_preprocessing.preprocessing_wrapper.PreProcessingWrapper: pre_processor_mapping: - observation: inventory _target_: maze.preprocessors.FlattenPreProcessor keep_original: false config: num_flatten_dims: 2 # monitoring wrapper maze.core.wrappers.monitoring_wrapper.MazeEnvMonitoringWrapper: observation_logging: false action_logging: true reward_logging: false
To learn more about the model config in
you can visit the introduction on how to work with template models.
Once the config is set up we are good to go to start our first training run (in the cmd below with the PPO algorithm) via the CLI with
maze-run -cn conf_train env=tutorial_cutting_2d_basic wrappers=tutorial_cutting_2d_basic \ model=tutorial_cutting_2d_basic algorithm=ppo
rc = RunContext( env="tutorial_cutting_2d_basic", wrappers="tutorial_cutting_2d_basic", model="tutorial_cutting_2d_basic", algorithm="ppo" ) rc.train()
Running the trainer should print a command line output similar to the one shown below.
step|path | value =====|============================================================================|==================== 12|train MultiStepActorCritic..time_epoch ······················| 24.333 12|train MultiStepActorCritic..time_rollout ······················| 0.754 12|train MultiStepActorCritic..learning_rate ······················| 0.000 12|train MultiStepActorCritic..policy_loss 0 | -0.016 12|train MultiStepActorCritic..policy_grad_norm 0 | 0.015 12|train MultiStepActorCritic..policy_entropy 0 | 0.686 12|train MultiStepActorCritic..critic_value 0 | -56.659 12|train MultiStepActorCritic..critic_value_loss 0 | 33.026 12|train MultiStepActorCritic..critic_grad_norm 0 | 0.500 12|train MultiStepActorCritic..time_update ······················| 1.205 12|train DiscreteActionEvents action substep_0/order | [len:8000, μ:0.5] 12|train DiscreteActionEvents action substep_0/piece_idx | [len:8000, μ:169.2] 12|train DiscreteActionEvents action substep_0/rotation | [len:8000, μ:1.0] 12|train BaseEnvEvents reward median_step_count | 200.000 12|train BaseEnvEvents reward mean_step_count | 200.000 12|train BaseEnvEvents reward total_step_count | 96000.000 12|train BaseEnvEvents reward total_episode_count | 480.000 12|train BaseEnvEvents reward episode_count | 40.000 12|train BaseEnvEvents reward std | 34.248 12|train BaseEnvEvents reward mean | -186.450 12|train BaseEnvEvents reward min | -259.000 12|train BaseEnvEvents reward max | -130.000
To get a nicer view on these numbers we can also take a look at the stats with Tensorboard.
tensorboard --logdir outputs
You can view it with your browser at http://localhost:6006/.
For now we can only inspect standard metrics such as reward statistics or mean_step_counts per episode. Unfortunately, this is not too informative with respect to the cutting problem we are currently addressing. In the next part we will show how to make logging much more informative by introducing events and KPIs.