SACAlgorithmConfig¶
-
class
maze.train.trainers.sac.sac_algorithm_config.
SACAlgorithmConfig
(n_epochs: int, n_rollout_steps: int, lr: float, entropy_coef: float, gamma: float, max_grad_norm: float, num_actors: int, batch_size: int, num_batches_per_iter: int, tau: float, target_update_interval: int, device: str, entropy_tuning: bool, target_entropy_multiplier: float, entropy_coef_lr: float, split_rollouts_into_transitions: bool, replay_buffer_size: int, initial_buffer_size: int, initial_sampling_policy: Union[maze.core.agent.policy.Policy, None, Mapping[str, Any], Any], rollouts_per_iteration: int, epoch_length: int, patience: int, rollout_evaluator: maze.train.trainers.common.evaluators.rollout_evaluator.RolloutEvaluator)¶ Algorithm parameters for SAC.
-
entropy_coef
: float¶ entropy coefficient to use if entropy tuning is set to false (called alpha in the org paper)
-
entropy_tuning
: bool¶ Specify whether to tune the entropy in the return computation or used a static value (called alpha tuning in the org paper)
-
initial_buffer_size
: int¶ The initial buffer size, where transaction are sampled with the initial sampling policy
-
initial_sampling_policy
: Union[maze.core.agent.policy.Policy, None, Mapping[str, Any], Any]¶ The policy used to initially fill the replay buffer
-
rollout_evaluator
: maze.train.trainers.common.evaluators.rollout_evaluator.RolloutEvaluator¶ Rollout evaluator.
-
split_rollouts_into_transitions
: bool¶ Specify whether all computed rollouts should be split into transitions before processing them
-
target_entropy_multiplier
: float¶ Specify an optional multiplier for the target entropy. This value is multiplied with the default target entropy computation (called alpha tuning in the paper):
discrete spaces: target_entropy = target_entropy_multiplier * ( - 0.98 * (-log (1 / cardinality(A)))
continues spaces: target_entropy = target_entropy_multiplier * (- dim(A)) (e.g., -6 for HalfCheetah-v1)
-