Policy and Value Networks
Concepts and Structure
Best Practices and Tutorials
Logging and Monitoring
Scaling the Training Process
Base class for model selection strategies.
Receives a new evaluation result from the model. Should be only called once per epoch.
reward – mean evaluation reward.