ReturnNormalizationRewardWrapper¶
-
class
maze.core.wrappers.return_normalization_reward_wrapper.
ReturnNormalizationRewardWrapper
(*args, **kwds)¶ Normalizes step reward by dividing through the standard deviation of the discounted return.
Implementation adopted from: https://github.com/MadryLab/implementation-matters
- Parameters
env – The underlying environment.
gamma – The discounting factor (e.g., 0.99).
epsilon – Ensures numerical stability and avoid division by zero (e.g., 1e-8).
-
clone_from
(env: maze.core.wrappers.return_normalization_reward_wrapper.ReturnNormalizationRewardWrapper) → None¶ (overrides
SimulatedEnvMixin
)implementation of
SimulatedEnvMixin
.
-
reset
()¶ implementation of
RewardWrapper
-
reward
(reward: float) → float¶ (overrides
RewardWrapper
)implementation of
RewardWrapper