MazeLogo
latest
  • Installation
  • A First Example
    • Training and Rollouts
    • Tensorboard
    • Training Outputs
  • Maze - Step by Step
    • 1. Cutting-2D Problem Specification
    • 2. Implementing the CoreEnv
      • 2.1. CoreEnv
      • 2.2. Environment Components
      • 2.3. MazeState and MazeAction
      • 2.4. Test Script
    • 3. Adding a Renderer
      • 3.1. Renderer
      • 3.2. Updating the CoreEnv
      • 3.3. Test Script
    • 4. Implementing the MazeEnv
      • 4.1. MazeEnv
      • 4.2. ObservationConversionInterface
      • 4.3. ActionConversionInterface
      • 4.4. Updating the CoreEnv
      • 4.5. Test Script
    • 5. Training the MazeEnv
      • 5.1. Hydra Configuration
      • 5.2. Training an Agent
    • 6. Adding Events and KPIs
      • 6.1. Events
      • 6.2. KPI Calculator
      • 6.3. Updating CoreEnv and Inventory
      • 6.4. Test Script
    • 7. Training with Events and KPIs
      • 7.1. Hydra Configuration
      • 7.2. Training an Agent
    • 8. Adding Reward Customization
      • 8.1. Reward
      • 8.2. Updating the Core- and MazeEnv
      • 8.3. Where to Go Next
  • API Documentation
    • Environment Interfaces
      • maze.core.env
        • BaseEnv
        • ActorID
        • StructuredEnv
        • CoreEnv
        • StructuredEnvSpacesMixin
        • MazeEnv
        • RenderEnvMixin
        • RecordableEnvMixin
        • SerializableEnvMixin
        • TimeEnvMixin
        • EventEnvMixin
        • SimulatedEnvMixin
        • ObservationConversionInterface
        • ActionConversionInterface
        • MazeStateType
        • MazeActionType
        • RewardAggregatorInterface
        • EnvironmentContext
    • Environment Wrappers
      • Interfaces and Utilities
        • Wrapper
        • ObservationWrapper
        • ActionWrapper
        • RewardWrapper
        • WrapperFactory
      • Built-in Wrappers
        • LogStatsWrapper
        • MazeEnvMonitoringWrapper
        • ObservationVisualizationWrapper
        • TimeLimitWrapper
        • RandomResetWrapper
        • SortedSpacesWrapper
        • NoDictSpacesWrapper
        • DictObservationWrapper
        • ObservationStackWrapper
        • NoDictObservationWrapper
        • DictActionWrapper
        • NoDictActionWrapper
        • SplitActionsWrapper
        • DiscretizeActionsWrapper
        • RewardScalingWrapper
        • RewardClippingWrapper
        • ReturnNormalizationRewardWrapper
      • Observation Pre-Processing Wrapper
        • PreProcessingWrapper
        • PreProcessor
        • FlattenPreProcessor
        • OneHotPreProcessor
        • ResizeImgPreProcessor
        • TransposePreProcessor
        • UnSqueezePreProcessor
        • Rgb2GrayPreProcessor
      • Observation Normalization Wrapper
        • ObservationNormalizationWrapper
        • ObservationNormalizationStrategy
        • obtain_normalization_statistics
        • estimate_observation_normalization_statistics
        • make_normalized_env_factory
        • MeanZeroStdOneObservationNormalizationStrategy
        • RangeZeroOneObservationNormalizationStrategy
      • Gym Environment Wrapper
        • GymMazeEnv
        • make_gym_maze_env
        • GymCoreEnv
        • GymRenderer
        • GymObservationConversion
        • GymActionConversion
    • Event System, Logging & Statistics
      • Event System
        • Subscriber
        • Pubsub
        • event_topic_factory
        • EventScope
        • EventService
        • EventCollection
        • EventRecord
      • Event Logging
        • StepEventLog
        • EpisodeEventLog
        • KpiCalculator
        • LogEventsWriterRegistry
        • LogEventsWriter
        • LogEventsWriterTSV
        • EventRow
        • SimpleEventLoggingSetup
        • ObservationEvents
        • ActionEvents
        • RewardEvents
        • create_categorical_plot
        • create_histogram
        • create_relative_bar_plot
        • create_violin_distribution
      • Statistics Logging
        • LogStatsEnv
        • LogStatsWriterConsole
        • LogStatsWriterTensorboard
        • LogStatsLevel
        • LogStatsConsumer
        • LogStatsAggregator
        • LogStatsWriter
        • GlobalLogState
        • LogStatsLogger
        • register_log_stats_writer
        • log_stats
        • increment_log_step
        • get_stats_logger
        • define_step_stats
        • define_episode_stats
        • define_epoch_stats
        • define_stats_grouping
        • define_plot
        • histogram
        • LogStatsValue
        • LogStatsGroup
        • LogStatsKey
        • LogStats
    • Rendering
      • Renderer
      • StepStatsRenderer
      • EventStatsRenderer
      • NotebookEventLogsViewer
      • NotebookTrajectoryViewer
      • KeyboardControlledTrajectoryViewer
      • RendererArg
      • IntRangeArg
      • OptionsArrayArg
    • Trajectory Recorder
      • InMemoryDataset
      • DataLoadWorker
      • TrajectoryProcessor
      • IdentityTrajectoryProcessor
      • DeadEndClippingTrajectoryProcessor
      • SpacesRecord
      • StepKeyType
      • StructuredSpacesRecord
      • StateRecord
      • TrajectoryRecord
      • StateTrajectoryRecord
      • SpacesTrajectoryRecord
      • MonitoringSetup
      • SimpleTrajectoryRecordingSetup
      • TrajectoryWriterRegistry
      • TrajectoryWriter
      • TrajectoryWriterFile
    • General and Rollout Runners
      • General Runners
        • Runner
        • maze_run
      • Rollout Runners
        • RolloutRunner
        • RolloutGenerator
        • SequentialRolloutRunner
        • ParallelRolloutRunner
        • ParallelRolloutWorker
        • EpisodeRecorder
        • EpisodeStatsReport
        • ExceptionReport
    • Policies, Critics and Agents
      • maze.core.agent
        • FlatPolicy
        • Policy
        • TorchPolicy
        • PolicySubStepOutput
        • PolicyOutput
        • DefaultPolicy
        • RandomPolicy
        • DummyCartPolePolicy
        • SerializedTorchPolicy
        • StateCritic
        • StateCriticStepOutput
        • StateCriticOutput
        • StateCriticStepInput
        • StateCriticInput
        • TorchStateCritic
        • TorchSharedStateCritic
        • TorchStepStateCritic
        • TorchDeltaStateCritic
        • StateActionCritic
        • TorchStateActionCritic
        • TorchSharedStateActionCritic
        • TorchStepStateActionCritic
        • TorchModel
        • TorchActorCritic
    • Agent Deployment
      • AgentDeployment
      • PolicyExecutor
      • ActionCandidates
      • MazeActionCandidates
      • ActionConversionCandidatesInterface
      • ExternalCoreEnv
    • Perception Module
      • maze.perception.blocks
        • PerceptionBlock
        • ShapeNormalizationBlock
        • InferenceBlock
        • InferenceGraph
        • DenseBlock
        • VGGConvolutionBlock
        • StridedConvolutionBlock
        • GraphConvBlock
        • GraphAttentionBlock
        • MultiHeadAttentionBlock
        • PointNetFeatureBlock
        • LSTMBlock
        • FlattenBlock
        • CorrelationBlock
        • ConcatenationBlock
        • FunctionalBlock
        • GlobalAveragePoolingBlock
        • MaskedGlobalPoolingBlock
        • MultiIndexSlicingBlock
        • RepeatToMatchBlock
        • SelfAttentionConvBlock
        • SelfAttentionSeqBlock
        • SliceBlock
        • ActionMaskingBlock
        • TorchModelBlock
        • FlattenDenseBlock
        • VGGConvolutionDenseBlock
        • VGGConvolutionGAPBlock
        • StridedConvolutionDenseBlock
        • LSTMLastStepBlock
      • maze.perception.builders
        • BaseModelBuilder
        • ConcatModelBuilder
      • maze.perception.models
        • BaseModelComposer
        • TemplateModelComposer
        • CustomModelComposer
        • SpacesConfig
        • BasePolicyComposer
        • ProbabilisticPolicyComposer
        • CriticComposerInterface
        • BaseStateCriticComposer
        • SharedStateCriticComposer
        • StepStateCriticComposer
        • DeltaStateCriticComposer
        • StateCriticComposer
        • BaseStateActionCriticComposer
        • SharedStateActionCriticComposer
        • StepStateActionCriticComposer
        • StateActionCriticComposer
        • FlattenConcatBaseNet
        • FlattenConcatPolicyNet
        • FlattenConcatStateValueNet
      • maze.perception.perception_utils
        • observation_spaces_to_in_shapes
        • flatten_spaces
        • stack_and_flatten_spaces
        • convert_to_torch
        • convert_to_numpy
      • maze.perception.weight_init
        • make_module_init_normc
        • compute_sigmoid_bias
    • Action Spaces and Distributions Module
      • ProbabilityDistribution
      • TorchProbabilityDistribution
      • DistributionMapper
      • atanh
      • tensor_clamp
      • CategoricalProbabilityDistribution
      • BernoulliProbabilityDistribution
      • DiagonalGaussianProbabilityDistribution
      • SquashedGaussianProbabilityDistribution
      • BetaProbabilityDistribution
      • MultiCategoricalProbabilityDistribution
      • DictProbabilityDistribution
    • Core Utilities
      • override
      • unused
      • set_seeds_globally
      • MazeSeeding
      • flat_structured_space
      • flat_structured_shapes
      • read_config
      • list_to_dict
      • EnvFactory
      • make_env_from_hydra
      • Factory
      • ConfigType
      • CollectionOfConfigType
      • CumulativeMovingMeanStd
    • Utilities
      • maze.utils
        • SimpleStatsLoggingSetup
        • clear_global_state
        • setup_logging
        • Timeout
        • tensorboard_to_pandas
        • Process
        • BColors
      • maze.hydra_plugins
        • MazeLocalLauncher
        • LauncherConfig
    • Trainers and Training Runners
      • General
        • Trainer
        • TrainingRunner
        • TrainConfig
        • ModelConfig
        • AlgorithmConfig
        • ModelSelectionBase
        • BestModelSelection
        • Evaluator
        • MultiEvaluator
        • RolloutEvaluator
        • ValueTransform
        • ReduceScaleValueTransform
        • support_to_scalar
        • scalar_to_support
        • BaseReplayBuffer
        • UniformReplayBuffer
      • Trainers
        • Actor-Critics (AC)
        • Evolutionary Strategies (ES)
        • Imitation Learning (IL) and Learning from Demonstrations (LfD)
      • Utilities
        • stack_numpy_dict_list
        • unstack_numpy_list_dict
        • compute_gradient_norm
        • stack_torch_dict_list
    • Parallelization
      • Vectorized Environments
        • VectorEnv
        • StructuredVectorEnv
        • SequentialVectorEnv
        • SubprocVectorEnv
        • CloudpickleWrapper
        • SinkHoleConsumer
        • disable_epoch_level_stats
      • Distributed Actors
        • DistributedActors
        • SequentialDistributedActors
        • SubprocDistributedActors
        • BaseDistributedWorkersWithBuffer
        • DummyDistributedWorkersWithBuffer
      • Utilities
        • BroadcastingContainer
        • BroadcastingManager
    • Run Context
      • Utilities
        • ConfigurationAuditor
        • ConfigurationLoader
        • RunMode
        • RunContextError
        • InvalidSpecificationError
      • Run Context
        • RunContext

Workflow

  • Training
    • Example 1: Your First Training Run
    • Example 2: Customizing with Provided Components
    • Example 3: Resuming Previous Training Runs
    • Training in Your Custom Project
    • Plain Python Training
    • Where to Go Next
  • Rollouts
    • The First Rollout
    • Rollout Runner Configuration
    • Environment and Policy Configuration
    • Plain Python Configuration
    • Where to Go Next
  • Deployment
    • Building a Deployment Agent
    • How does this work under the hood?
    • Where to Go Next
  • Collecting and Visualizing Rollouts
    • Requirements
    • Trajectory Data Collection
    • Trajectory Visualization
    • Where to Go Next
  • Imitation Learning and Fine-Tuning
    • Collect Training Trajectory Data
    • Learn from Example Trajectories
    • Fine-Tune a Pre-Trained Policy
    • Where to Go Next
  • Experiment Configuration
    • Command Line Overrides
    • Experiment Config Files
    • Hyper Parameter Grid Search
    • Hyperparameter Optimization
    • Where to Go Next

Policy and Value Networks

  • Introducing the Perception Module
    • List of Features
    • Perception Blocks
    • Inference Blocks
    • Model Composers
    • Implementing Custom Perception Blocks
    • The Bigger Picture
    • Where to Go Next
  • Action Spaces and Distributions
    • List of Features
    • Action Spaces and Probability Distributions
    • Example 1: Mapping Action Spaces to Distributions
    • Example 2: Mapping Actions to Distributions
    • Example 3: Using Custom Distributions
    • Example 4: Plain Python Configuration
    • The Bigger Picture
    • Where to Go Next
  • Working with Template Models
    • List of Features
    • Model Builders (Architecture Templates)
    • Example 1: Feed Forward Models
    • Example 2: Recurrent Models
    • Example 3: Single Observation and Action Models
    • Example 4: Shared Embedding Feed Forward Model
    • Where to Go Next
  • Working with Custom Models
    • List of Features
    • The Custom Models Signature
    • Example 1: Simple Networks with Perception Blocks
    • Example 2: Complex Networks with Perception Blocks
    • Example 3: Custom Networks with (plain PyTorch) Python
    • Example 4: Custom Shared embeddings with Perception Blocks
    • Where to Go Next

Training

  • Maze Trainers
    • Supported Spaces
    • Advantage Actor-Critic (A2C)
    • Proximal Policy Optimization (PPO)
    • Importance Weighted Actor-Learner Architecture (IMPALA)
    • Soft Actor-Critic (from Demonstrations) (SAC, SACfD)
    • Behavioural Cloning (BC)
    • Evolutionary Strategies (ES)
    • Maze RLlib Trainer
    • Where to Go Next
  • Maze RLlib Runner
    • List of Features
    • Example 1: Training with Maze-RLlib and Hydra
    • Example 2: Overwriting Training Parameters
    • Example 3: Training with RLlib’s Default Models
    • Supported Algorithms
    • The Bigger Picture
    • Good to Know
    • Where to Go Next

Concepts and Structure

  • Policies, Critics and Agents
    • Policies (Actors)
    • Value Functions (Critics)
    • Actor-Critics
    • Where to Go Next
  • Maze Environment Hierarchy
    • Core Environment
    • Gym-Space Interfaces
    • Maze Environment
    • Wrappers
    • Structured Environments
    • Where to Go Next
  • Maze Event System
    • Motivation
    • What is an event?
    • How are events used in Maze?
    • PubSub: Dispatching and Observing Events
    • EventEnvMixin Interface: Querying Events
    • Where to Go Next
  • Configuration with Hydra
    • Hydra: Overview
      • Introduction
      • Configuration Root, Groups and Defaults
      • Overrides
      • Output Directory
      • Maze Runner Concept
      • Where to Go Next
    • Hydra: Your Own Configuration Files
      • Step 1: Custom Config Module in Hydra Search Path
      • Step 2a: Custom Config Components
      • Step 2b: Experiment Config
      • Step 2c: Custom Root Config
      • Step 3: Custom Runners (Optional)
      • Where to Go Next
    • Hydra: Advanced Concepts
      • Maze Factory
      • Interpolation
      • Specializations
      • Where to Go Next
  • Environment Rendering
  • Structured Environments
    • Flat Environments
      • Control Flow
      • Where to Go Next
    • Multi-Stepping
      • Control Flow
      • Relation to Hierarchical RL
      • Relation to Auto-Regressive Action Distributions
      • Where to Go Next
    • Multi-Agent RL
      • Control Flow
      • Where to Go Next
    • Hierarchical RL
      • Motivating Example
      • Control Flow
      • Where to Go Next
    • Beyond Flat Environments with Actors
    • Where to Go Next
  • High-level API: RunContext
    • Motivation
    • Comparison with the CLI (maze-run)
    • Usage

Environment Customization

  • Customizing Core and Maze Envs
    • List of Features
    • Example: Core- and MazeEnv Configuration
    • Where to Go Next
  • Customizing / Shaping Rewards
    • List of Features
    • Configuring the CoreEnv
    • Implementing a Custom Reward
    • Where to Go Next
  • Environment Wrappers
    • List of Features
    • Example 1: Customizing Environments with Wrappers
    • Example 2: Using Custom Wrappers
    • Example 3: Plain Python Configuration
    • Built-in Wrappers
    • Where to Go Next
  • Observation Pre-Processing
    • List of Features
    • Example 1: Observation Specific Pre-Processors
    • Example 2: Cascaded Pre-Processing
    • Example 3: Using Custom Pre-Processors
    • Example 4: Plain Python Configuration
    • Built-in Pre-Processors
    • Where to Go Next
  • Observation Normalization
    • List of Features
    • Example 1: Normalization with Estimated Statistics
    • Example 2: Normalization with Manual Statistics
    • Example 3: Custom Normalization and Excluding Observations
    • Example 4: Using Custom Normalization Strategies
    • Example 5: Plain Python Configuration
    • Built-in Normalization Strategies
    • The Bigger Picture
    • Where to Go Next

Best Practices and Tutorials

  • Tricks of the Trade
    • Learning and Optimization
    • Models and Networks
    • Observations
  • Cheat Sheet
  • Integrating an Existing Gym Environment
    • Instantiating a Gym Environment as a Maze Environment
    • Test your own Gym Environment with Maze
    • Where to Go Next
  • Structured Environments and Action Masking
    • Turning a “flat” MazeEnv into a StructuredEnv
      • Analyzing the Problem Structure
      • Implementing the Structured Environment
      • Test Script
    • Training the Structured Environment
      • A Simple Problem Setting
      • Task-Specific Actor-Critic Model
      • Multi-Step Training
    • Adding Step-Conditional Action Masking
      • Masked Structured Environment
      • Test Script
    • Training with Action Masking
      • Masked Policy Models
      • Retraining with Masking
      • In Depth Inspection of Learning Progress
  • Combining Maze with other RL Frameworks
    • Reusing Environment Customization Features
    • Reusing the Hydra Configuration System
    • Where to Go Next
  • Plain Python Training Example (high-level)
    • Environment Setup
    • Algorithm Setup
    • Custom Model Setup
    • Full Python Code
  • Plain Python Training Example (low-level)
    • Environment Setup
    • Model Setup
    • Trainer Setup
    • Train the Agent
    • Full Python Code

Logging and Monitoring

  • Tensorboard and Command Line Logging
    • Tensorboard Logging
    • Command Line Logging
    • Where to Go Next
  • Event and KPI Logging
    • Events
    • Key Performance Indicators (KPIs)
    • Plain Python Configuration
    • Where to Go Next
  • Action Distribution Visualization
    • Discrete and Multi Binary Actions
    • Continuous Actions
    • Where to Go Next
  • Observation Logging
    • Observation Distribution Visualization
    • Observation Visualization
    • Where to Go Next

Scaling the Training Process

  • Runner Concept
Maze
  • »
  • API Documentation
  • Edit on GitHub

API Documentation¶

This page provides an overview of the Maze API documentation

Contents

  • Environment Interfaces
  • Environment Wrappers
  • Event System, Logging & Statistics
  • Rendering
  • Trajectory Recorder
  • General and Rollout Runners
  • Policies, Critics and Agents
  • Agent Deployment
  • Perception Module
  • Action Spaces and Distributions Module
  • Core Utilities
  • Utilities
  • Trainers and Training Runners
  • Parallelization
  • Run Context
Previous Next

© Copyright 2021, EnliteAI GmbH. Revision b0a8d812.

Built with Sphinx using a theme provided by Read the Docs.
Read the Docs v: latest
Versions
latest
stable
Downloads
html
epub
On Read the Docs
Project Home
Builds