Workflow
Policy and Value Networks
Training
Concepts and Structure
Environment Customization
Best Practices and Tutorials
Logging and Monitoring
Scaling the Training Process
maze.train.trainers.es.es_events.
ESEvents
Event interface, defining statistics emitted by the ESTrainer.
policy_grad_norm
gradient norm of the step policies
policy_norm
l2 norm of the step policy parameters
real_time
elapsed real time per iteration (=epoch)
update_ratio
norm(optimizer step) / norm(all parameters)