SACEvents¶
-
class
maze.train.trainers.sac.sac_events.
SACEvents
¶ Events specific for the SAC algorithm, in order to record and analyse it’s behaviour in more detail
-
buffer_avg_pick_per_transition
(value: int) → None¶ Record the cumulative moving average of the picks per transition of the buffer.
- Parameters
value – The cumulative moving average of the number of times a single transitions is sampled from the trajectory buffer.
-
buffer_size
(value: int) → None¶ Record the size of the trajectory buffer.
- Parameters
value – The size of the trajectory buffer.
-
critic_grad_norm
(critic_key: Union[int, str], value: float) → None¶ Record the critic gradient norm.
- Parameters
critic_key – The key of the critic.
value – The value.
-
critic_value
(critic_key: Union[int, str], value: float) → None¶ Record the critic value.
- Parameters
critic_key – The key of the critic.
value – The value.
-
critic_value_loss
(critic_key: Union[int, str], value: float) → None¶ Record the critic value loss.
- Parameters
critic_key – The key of the critic.
value – The value.
-
entropy_coef
(step_key: Union[str, int], value: float) → None¶ Record the current entropy coefficient, interesting when using entropy tuning.
- Parameters
step_key – The step_key of the multi-step env.
value – The current value of the entropy coefficient.
-
entropy_loss
(step_key: Union[str, int], value: float) → None¶ Record the current entropy loss, interesting when using entropy tuning.
- Parameters
step_key – The step_key of the multi-step env.
value – The current value of the entropy loss.
-
errors_between_critics
(critic_key: Union[int, str], value: float) → None¶ Record the error between critic and target critic.
- Parameters
critic_key – The key of the critic.
value – The value.
-
estimated_queue_sizes
(before: int, after: int) → None¶ Record the estimated queue size before and after the collection of the actors output.
- Parameters
before – The estimated queue size before collection.
after – The estimated queue size after collection.
-
policy_entropy
(step_key: Union[int, str], value: float) → None¶ Record the policy entropy.
- Parameters
step_key – The step_key of the multi-step env.
value – The value.
-
policy_grad_norm
(step_key: Union[int, str], value: float) → None¶ Record the gradient norm.
- Parameters
step_key – The step_key of the multi-step env.
value – The value.
-
policy_loss
(step_key: Union[int, str], value: float) → None¶ Record the policy loss.
- Parameters
step_key – The step_key of the multi-step env.
value – The value.
-
policy_mean_logp
(step_key: Union[int, str], value: float) → None¶ Record the mean policy logp.
- Parameters
step_key – The step_key of the multi-step env.
value – The value.
-
time_backprob
(time: float, percent: float) → None¶ Record the total time it took the learner to backprob the loss + relative per to total update time.
- Parameters
time – The absolute time it took for the computation.
percent – The relative percentage this computation took w.r.t. to one update.
-
time_dequeuing_actors
(time: float, percent: float) → None¶ Record the time it took to dequeue the actors output from the synced queue + relative per to total update time.
- Parameters
time – The absolute time it took for the computation.
percent – The relative percentage this computation took w.r.t. to one update.
-
time_learner_rollout
(time: float, percent: float) → None¶ - Record the total time it took the learner to compute the logits on the agents output
relative per to total update time.
- Parameters
time – The absolute time it took for the computation.
percent – The relative percentage this computation took w.r.t. to one update.
-
time_loss_computation
(time: float, percent: float) → None¶ Record the total time it took the learner compute the loss + relative per to total update time.
- Parameters
time – The absolute time it took for the computation.
percent – The relative percentage this computation took w.r.t. to one update.
-
time_sampling_from_buffer
(time: float, percent: float) → None¶ Record the total time it took the learner to sample from the buffer + relative per to total update time.
- Parameters
time – The absolute time it took for the computation.
percent – The relative percentage this computation took w.r.t. to one update.
-