ReturnNormalizationRewardWrapper

class maze.core.wrappers.return_normalization_reward_wrapper.ReturnNormalizationRewardWrapper(*args, **kwds)

Normalizes step reward by dividing through the standard deviation of the discounted return.

Implementation adopted from: https://github.com/MadryLab/implementation-matters

Parameters
  • env – The underlying environment.

  • gamma – The discounting factor (e.g., 0.99).

  • epsilon – Ensures numerical stability and avoid division by zero (e.g., 1e-8).

clone_from(env: maze.core.wrappers.return_normalization_reward_wrapper.ReturnNormalizationRewardWrapper)None

(overrides SimulatedEnvMixin)

implementation of SimulatedEnvMixin.

reset()

implementation of RewardWrapper

reward(reward: float)float

(overrides RewardWrapper)

implementation of RewardWrapper