SACEvents

class maze.train.trainers.sac.sac_events.SACEvents

Events specific for the SAC algorithm, in order to record and analyse it’s behaviour in more detail

buffer_avg_pick_per_transition(value: int)None

Record the cumulative moving average of the picks per transition of the buffer.

Parameters

value – The cumulative moving average of the number of times a single transitions is sampled from the trajectory buffer.

buffer_size(value: int)None

Record the size of the trajectory buffer.

Parameters

value – The size of the trajectory buffer.

critic_grad_norm(critic_key: Union[int, str], value: float)None

Record the critic gradient norm.

Parameters
  • critic_key – The key of the critic.

  • value – The value.

critic_value(critic_key: Union[int, str], value: float)None

Record the critic value.

Parameters
  • critic_key – The key of the critic.

  • value – The value.

critic_value_loss(critic_key: Union[int, str], value: float)None

Record the critic value loss.

Parameters
  • critic_key – The key of the critic.

  • value – The value.

entropy_coef(step_key: Union[str, int], value: float)None

Record the current entropy coefficient, interesting when using entropy tuning.

Parameters
  • step_key – The step_key of the multi-step env.

  • value – The current value of the entropy coefficient.

entropy_loss(step_key: Union[str, int], value: float)None

Record the current entropy loss, interesting when using entropy tuning.

Parameters
  • step_key – The step_key of the multi-step env.

  • value – The current value of the entropy loss.

errors_between_critics(critic_key: Union[int, str], value: float)None

Record the error between critic and target critic.

Parameters
  • critic_key – The key of the critic.

  • value – The value.

estimated_queue_sizes(before: int, after: int)None

Record the estimated queue size before and after the collection of the actors output.

Parameters
  • before – The estimated queue size before collection.

  • after – The estimated queue size after collection.

policy_entropy(step_key: Union[int, str], value: float)None

Record the policy entropy.

Parameters
  • step_key – The step_key of the multi-step env.

  • value – The value.

policy_grad_norm(step_key: Union[int, str], value: float)None

Record the gradient norm.

Parameters
  • step_key – The step_key of the multi-step env.

  • value – The value.

policy_loss(step_key: Union[int, str], value: float)None

Record the policy loss.

Parameters
  • step_key – The step_key of the multi-step env.

  • value – The value.

policy_mean_logp(step_key: Union[int, str], value: float)None

Record the mean policy logp.

Parameters
  • step_key – The step_key of the multi-step env.

  • value – The value.

time_backprob(time: float, percent: float)None

Record the total time it took the learner to backprob the loss + relative per to total update time.

Parameters
  • time – The absolute time it took for the computation.

  • percent – The relative percentage this computation took w.r.t. to one update.

time_dequeuing_actors(time: float, percent: float)None

Record the time it took to dequeue the actors output from the synced queue + relative per to total update time.

Parameters
  • time – The absolute time it took for the computation.

  • percent – The relative percentage this computation took w.r.t. to one update.

time_learner_rollout(time: float, percent: float)None
Record the total time it took the learner to compute the logits on the agents output
  • relative per to total update time.

Parameters
  • time – The absolute time it took for the computation.

  • percent – The relative percentage this computation took w.r.t. to one update.

time_loss_computation(time: float, percent: float)None

Record the total time it took the learner compute the loss + relative per to total update time.

Parameters
  • time – The absolute time it took for the computation.

  • percent – The relative percentage this computation took w.r.t. to one update.

time_sampling_from_buffer(time: float, percent: float)None

Record the total time it took the learner to sample from the buffer + relative per to total update time.

Parameters
  • time – The absolute time it took for the computation.

  • percent – The relative percentage this computation took w.r.t. to one update.