Cheat Sheet

Run a rollout to test an environment with random action sampling:

maze-run -cn conf_rollout env.name=CartPole-v1 policy=random_policy

Run a rollout and render the state of the environment:

maze-run -cn conf_rollout env.name=CartPole-v1 policy=random_policy \
runner=sequential runner.render=true

Train a policy with evolutionary strategies (ES):

maze-run -cn conf_train env.name=CartPole-v1 algorithm=es model=vector_obs

Train a policy with with an actor-critic trainer such as A2C:

maze-run -cn conf_train env.name=CartPole-v1 algorithm=a2c \
model=vector_obs critic=template_state

Resume training from a previous model state:

maze-run -cn conf_train env.name=CartPole-v1 algorithm=a2c \
model=vector_obs critic=template_state input_dir=outputs/<experiment-dir>

Run a rollout of a policy, trained with the command above:

maze-run -cn conf_rollout env.name=CartPole-v1 model=vector_obs \
policy=torch_policy input_dir=outputs/<experiment-dir>