5. Training the MazeEnv¶

The complete code for this part of the tutorial can be found here

# file structure
- cutting_2d
    - main.py
    - env ...
    - space_interfaces ...
    - conf
        - env
            - tutorial_cutting_2d_basic.yaml  # new
        - model
            - tutorial_cutting_2d_basic.yaml  # new
        - wrappers
            - tutorial_cutting_2d_basic.yaml  # new

Note

Hydra only accepts .yaml as file extension.

Page Overview

Hydra Configuration
Training an Agent

5.1. Hydra Configuration ¶

The entire Maze workflow is boosted by the Hydra configuration system. To be able to perform our first training run via the Maze CLI we have to add a few more config files. Going into the very details of the config structure is for now beyond the scope of this tutorial. However, we still provide some information on the parts relevant for this example.

The config file for the maze_env_factory looks as follows:

conf/env/tutorial_cutting_2d_basic.yaml¶

# @package env
_target_: tutorial_maze_env.part03_maze_env.env.maze_env.maze_env_factory

# parametrizes the core environment
max_pieces_in_inventory: 200
raw_piece_size: [100, 100]
static_demand: [30, 15]

Additionally, we also provide a wrapper config but refer to Customizing Environments with Wrappers for details.

conf/wrappers/tutorial_cutting_2d_basic.yaml¶

# @package wrappers

# limits the maximum number of time steps of an episode
maze.core.wrappers.time_limit_wrapper.TimeLimitWrapper:
  max_episode_steps: 200

# flattens the dictionary observations to work with DenseLayers
maze.core.wrappers.observation_preprocessing.preprocessing_wrapper.PreProcessingWrapper:
  pre_processor_mapping:
    - observation: inventory
      _target_: maze.preprocessors.FlattenPreProcessor
      keep_original: false
      config:
        num_flatten_dims: 2

# monitoring wrapper
maze.core.wrappers.monitoring_wrapper.MazeEnvMonitoringWrapper:
  observation_logging: false
  action_logging: true
  reward_logging: false

To learn more about the model config in conf/env_model/tutorial_cutting_2d_basic.yaml you can visit the introduction on how to work with template models.

5.2. Training an Agent ¶

Once the config is set up we are good to go to start our first training run (in the cmd below with the PPO algorithm) via the CLI with

maze-run -cn conf_train env=tutorial_cutting_2d_basic wrappers=tutorial_cutting_2d_basic \
model=tutorial_cutting_2d_basic algorithm=ppo

Running the trainer should print a command line output similar to the one shown below.

 step|path                                                                        |               value
=====|============================================================================|====================
train     MultiStepActorCritic..time_epoch            ······················|              24.333
train     MultiStepActorCritic..time_rollout          ······················|               0.754
train     MultiStepActorCritic..learning_rate         ······················|               0.000
train     MultiStepActorCritic..policy_loss           0                     |              -0.016
train     MultiStepActorCritic..policy_grad_norm      0                     |               0.015
train     MultiStepActorCritic..policy_entropy        0                     |               0.686
train     MultiStepActorCritic..critic_value          0                     |             -56.659
train     MultiStepActorCritic..critic_value_loss     0                     |              33.026
train     MultiStepActorCritic..critic_grad_norm      0                     |               0.500
train     MultiStepActorCritic..time_update           ······················|               1.205
train     DiscreteActionEvents  action                substep_0/order       |   [len:8000, μ:0.5]
train     DiscreteActionEvents  action                substep_0/piece_idx   | [len:8000, μ:169.2]
train     DiscreteActionEvents  action                substep_0/rotation    |   [len:8000, μ:1.0]
train     BaseEnvEvents         reward                median_step_count     |             200.000
train     BaseEnvEvents         reward                mean_step_count       |             200.000
train     BaseEnvEvents         reward                total_step_count      |           96000.000
train     BaseEnvEvents         reward                total_episode_count   |             480.000
train     BaseEnvEvents         reward                episode_count         |              40.000
train     BaseEnvEvents         reward                std                   |              34.248
train     BaseEnvEvents         reward                mean                  |            -186.450
train     BaseEnvEvents         reward                min                   |            -259.000
train     BaseEnvEvents         reward                max                   |            -130.000

To get a nicer view on these numbers we can also take a look at the stats with Tensorboard.

tensorboard --logdir outputs

You can view it with your browser at http://localhost:6006/.

For now we can only inspect standard metrics such as reward statistics or mean_step_counts per episode. Unfortunately, this is not too informative with respect to the cutting problem we are currently addressing. In the next part we will show how to make logging much more informative by introducing events and KPIs.

5. Training the MazeEnv¶

5.1. Hydra Configuration¶

5.2. Training an Agent¶

5.1. Hydra Configuration ¶

5.2. Training an Agent ¶