class maze.train.parallelization.distributed_actors.subproc_distributed_actors.SubprocDistributedActors(env_factory: Callable[], Union[maze.core.env.structured_env.StructuredEnv, maze.core.env.structured_env_spaces_mixin.StructuredEnvSpacesMixin, maze.core.log_stats.log_stats_env.LogStatsEnv]], policy: maze.core.agent.torch_policy.TorchPolicy, n_rollout_steps: int, n_actors: int, batch_size: int, queue_out_of_sync_factor: float, start_method: str, actor_env_seeds: List[int], actor_agent_seeds: List[int])

Basic Distributed-Actors-Module using python multiprocessing.Process

  • queue_out_of_sync_factor – this factor multiplied by the actor_batch_size gives the size of the queue. Therefore if the all rollouts computed can be at most (queue_out_of_sync_factor + num_agents/actor_batch_size) out of sync with learner policy.

  • start_method – Method used to start the subprocesses. Must be one of the methods returned by multiprocessing.get_all_start_methods(). Defaults to ‘forkserver’ on available platforms, and ‘spawn’ otherwise.

  • actor_env_seeds – A list of seeds for each actors’ env.

  • actor_agent_seeds – A list of seed for each actors’ policy.

broadcast_updated_policy(state_dict: Dict)None

(overrides DistributedActors)

Store the newest policy in the shared network object

collect_outputs(learner_device: str) → Tuple[maze.core.trajectory_recording.records.structured_spaces_record.StructuredSpacesRecord, float, float, float]

(overrides DistributedActors)

Collect actor outputs from the multiprocessing queue.

get_multiprocessing_context(start_method: str)

Get the right context for the multiprocessing.

Fork is the best option, but is only available on unix systems and does not support actors and learner on gpu. Forkserver is then the second choice, Spawn the third. :return:


(overrides DistributedActors)

Start all processes in the self.actors list


(overrides DistributedActors)

Print kill all processes