Regularization

Target Policy Smoothing

Introduced by Fujimoto et al. in Addressing Function Approximation Error in Actor-Critic Methods

Target Policy Smoothing is a regularization strategy for the value function in reinforcement learning. Deterministic policies can overfit to narrow peaks in the value estimate, making them highly susceptible to functional approximation error, increasing the variance of the target. To reduce this variance, target policy smoothing adds a small amount of random noise to the target policy and averages over mini-batches - approximating a SARSA-like expectation/integral.

The modified target update is:

$$ y = r + \gamma{Q}_{\theta'}\left(s', \pi_{\theta'}\left(s'\right) + \epsilon \right) $$

$$ \epsilon \sim \text{clip}\left(\mathcal{N}\left(0, \sigma\right), -c, c \right) $$

where the added noise is clipped to keep the target close to the original action. The outcome is an algorithm reminiscent of Expected SARSA, where the value estimate is instead learned off-policy and the noise added to the target policy is chosen independently of the exploration policy. The value estimate learned is with respect to a noisy policy defined by the parameter $\sigma$.

Source: Addressing Function Approximation Error in Actor-Critic Methods

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Reinforcement Learning (RL) 58 40.56%
Continuous Control 26 18.18%
OpenAI Gym 8 5.59%
Decision Making 7 4.90%
Autonomous Driving 5 3.50%
Offline RL 3 2.10%
Meta-Learning 3 2.10%
Benchmarking 3 2.10%
D4RL 2 1.40%

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories