ActorCriticEvents¶

class maze.train.trainers.common.actor_critic.actor_critic_events.ActorCriticEvents¶

Event interface, defining statistics emitted by the A2CTrainer.

critic_grad_norm(critic_id: int, value: float)¶: gradient norm of the step critic

critic_value(critic_id: int, value: float)¶: critic value of the step critic

critic_value_loss(critic_id: int, value: float)¶: optimization loss of the step critic

discounted_returns(critic_id: int, value: float)¶: actual returns of the env

learning_rate(value: float)¶: optimizer learning rate

policy_entropy(substep_key: int, value: float)¶: entropy of the step policies

policy_grad_norm(substep_key: int, value: float)¶: gradient norm of the step policies

policy_loss(substep_key: int, value: float)¶: optimization loss of the step policy

time_epoch(value: float)¶: time required for epoch

time_rollout(value: float)¶: time required for rollout

time_update(value: float)¶: time required for update