AgentDeployment¶

class maze.core.agent_deployment.agent_deployment.AgentDeployment(policy: Union[None, Mapping[str, Any], Any], env: Union[None, Mapping[str, Any], Any], wrappers: Union[List[Union[None, Mapping[str, Any], Any]], Mapping[Union[str, Type], Union[None, Mapping[str, Any], Any]]] = None)¶

Encapsulates an agent, space interfaces and a stack of wrappers, to make the agent’s MazeActions accessible to an external env.

Note: The policy, env, and wrappers parameters are compatible with hydra configuration used for rollouts (alternatively, an already instantiated policy/env can be passed in as well).

How it works:

External env should supply states to agent deployment object, and can query it for agent MazeActions. The agent with the supplied policy (or multiple policies) is run on a separate thread.

Note that the two threads (main thread running this wrapper and the second thread running the agent, wrappers etc.) never run in parallel, i.e. one is always suspended. This is enforced using the queues. Either the main thread runs and the agent thread is waiting for the state to be passed from the main thread, or the agent thread is running (computing the MazeAction) and the main thread is waiting until the MazeAction is passed back (then, the second thread is suspended again until the next state is passed in via the queue).

Queues have max size of one, enforcing that one step can be taken at a time.

Parameters

policy – Structured policy to query for actions, or a config which will be used to built the policy.
env – Either an instantiated simulation environment which will be used for action and observation processing (i.e., the Maze env, wrapper stack and env context will be used), or a config for instantiating such env.
wrappers – Configuration for (additional) wrappers, if required.

act(maze_state: Any, reward: Union[None, float, numpy.ndarray, Any], done: bool, info: Union[None, Dict[Any, Any]], events: Optional[List[maze.core.events.event_record.EventRecord]] = None, actor_id: maze.core.env.structured_env.ActorID = ActorID(step_key=0, agent_id=0)) → Any¶

Query the agent for MazeAction derived from the given state.

Passes the state etc. to the agent’s thread, where it is integrated into an ordinary env rollout loop. In the first step, an env reset call is propagated through the env wrapper stack on agent’s thread.

Parameters

maze_state – Current state of the environment.
reward – Reward for the previous step (can be null in initial step)
done – Whether the external environment is done
info – Info dictionary
events – List of events to be recorded for this step (mainly useful for statistics and event logs)
actor_id – Optional ID of the actor to run next (comprised of policy_id and agent_id)

Returns

MazeAction from the agent

close(maze_state: Any, reward: Union[float, numpy.ndarray, Any], done: bool, info: Dict[Any, Any], events: Optional[List[maze.core.events.event_record.EventRecord]] = None)¶

Should be called when the rollout is finished. While this has no effect on the provided MazeActions, it passes an env reset call through the wrapper stack, enabling the wrappers to do any work they normally do at the end of an episode (like write trajectory data).

Parameters

maze_state – Final state of the rollout
reward – Reward for the previous step (can be null in initial step)
done – Whether the external environment is done
info – Info dictionary
events – List of events to be recorded for this step (mainly useful for statistics and event logs)