ImpalaAlgorithmConfig

class maze.train.trainers.impala.impala_algorithm_config.ImpalaAlgorithmConfig(n_epochs: int, epoch_length: int, patience: int, critic_burn_in_epochs: int, n_rollout_steps: int, lr: float, gamma: float, policy_loss_coef: float, value_loss_coef: float, entropy_coef: float, max_grad_norm: float, device: str, queue_out_of_sync_factor: float, actors_batch_size: int, num_actors: int, vtrace_clip_rho_threshold: float, vtrace_clip_pg_rho_threshold: float, rollout_evaluator: maze.train.trainers.common.evaluators.rollout_evaluator.RolloutEvaluator)

Algorithm parameters for Impala.

actors_batch_size: int

number of actors to combine to one batch

critic_burn_in_epochs: int

Number of critic (value function) burn in epochs

device: str

Either “cpu” or “cuda”

entropy_coef: float

weight of entropy loss

epoch_length: int

number of updates per epoch

gamma: float

discounting factor

lr: float

learning rate

max_grad_norm: float

The maximum allowed gradient norm during training

n_epochs: int

number of epochs to train

n_rollout_steps: int

Number of steps taken for each rollout

num_actors: int

number of actors to be run

patience: int

number of steps used for early stopping

policy_loss_coef: float

weight of policy loss

queue_out_of_sync_factor: float

this factor multiplied by the actor_batch_size gives the size of the queue for the agents output collected by the learner. Therefor if the all rollouts computed can be at most (queue_out_of_sync_factor + num_agents/actor_batch_size) out of sync with learner policy

rollout_evaluator: maze.train.trainers.common.evaluators.rollout_evaluator.RolloutEvaluator

Rollout evaluator.

value_loss_coef: float

weight of value loss

vtrace_clip_pg_rho_threshold: float

A scalar float32 tensor with the clipping threshold on rho_s in rho_s delta log pi(a|x) (r + gamma v_{s+1} - V(x_sfrom_importance_weights)). If None, no clipping is applied.

vtrace_clip_rho_threshold: float

A scalar float32 tensor with the clipping threshold for importance weights (rho) when calculating the baseline targets (vs). rho^bar in the paper. If None, no clipping is applied.