Methods > General > Regularization

Target Policy Smoothing

Introduced by Fujimoto et al. in Addressing Function Approximation Error in Actor-Critic Methods

Target Policy Smoothing is a regularization strategy for the value function in reinforcement learning. Deterministic policies can overfit to narrow peaks in the value estimate, making them highly susceptible to functional approximation error, increasing the variance of the target. To reduce this variance, target policy smoothing adds a small amount of random noise to the target policy and averages over mini-batches - approximating a SARSA-like expectation/integral.

The modified target update is:

$$ y = r + \gamma{Q}_{\theta'}\left(s', \pi_{\theta'}\left(s'\right) + \epsilon \right) $$

$$ \epsilon \sim \text{clip}\left(\mathcal{N}\left(0, \sigma\right), -c, c \right) $$

where the added noise is clipped to keep the target close to the original action. The outcome is an algorithm reminiscent of Expected SARSA, where the value estimate is instead learned off-policy and the noise added to the target policy is chosen independently of the exploration policy. The value estimate learned is with respect to a noisy policy defined by the parameter $\sigma$.

Source: Addressing Function Approximation Error in Actor-Critic Methods

Latest Papers

PAPER DATE
Learning to Play Soccer From Scratch: Sample-Efficient Emergent Coordination through Curriculum-Learning and Competition
Pavan SamtaniFrancisco LeivaJavier Ruiz-del-Solar
2021-03-09
Application of twin delayed deep deterministic policy gradient learning for the control of transesterification process
Tanuja JoshiShikhar MakkerHariprasad KodamanaHarikumar Kandath
2021-02-25
Memory-based Deep Reinforcement Learning for POMDP
Lingheng MengRob GorbetDana Kulić
2021-02-24
OffCon$^3$: What is state of the art anyway?
| Philip J. BallStephen J. Roberts
2021-01-27
Learning Synthetic Environments for Reinforcement Learning with Evolution Strategies
| Fabio FerreiraThomas NierhoffFrank Hutter
2021-01-24
GST: Group-Sparse Training for Accelerating Deep Reinforcement Learning
JuHyoung LeeSangyeob KimSangjin KimWooyoung JoHoi-jun Yoo
2021-01-24
PGPS : Coupling Policy Gradient with Population-based Search
Anonymous
2021-01-01
Policy Gradient RL Algorithms as Directed Acyclic Graphs
| Juan Jose Garau Luis
2020-12-14
OPAC: Opportunistic Actor-Critic
Srinjoy RoySaptam BakshiTamal Maharaj
2020-12-11
Efficient Reservoir Management through Deep Reinforcement Learning
Xinrun WangTarun NairHaoyang LiYuh Sheng Reuben WongNachiket KelkarSrinivas VaidyanathanRajat NayakBo AnJagdish KrishnaswamyMilind Tambe
2020-12-07
FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance
| Xiao-Yang LiuHongyang YangQian ChenRunjia ZhangLiuqing YangBowen XiaoChristina Dan Wang
2020-11-19
Tonic: A Deep Reinforcement Learning Library for Fast Prototyping and Benchmarking
| Fabio Pardo
2020-11-15
RealAnt: An Open-Source Low-Cost Quadruped for Research in Real-World Reinforcement Learning
| Rinu BoneyJussi SainioMikko KaivolaArno SolinJuho Kannala
2020-11-05
Hindsight Experience Replay with Kronecker Product Approximate Curvature
Dhuruva Priyan G MAbhik SinglaShalabh Bhatnagar
2020-10-09
Sample-Efficient Automated Deep Reinforcement Learning
| Jörg K. H. FrankeGregor KöhlerAndré BiedenkappFrank Hutter
2020-09-03
Collision Avoidance Robotics Via Meta-Learning (CARML)
| Abhiram IyerAravind Mahadevan
2020-07-16
Noise, overestimation and exploration in Deep Reinforcement Learning
Rafael Stekolshchik
2020-06-25
Reducing Estimation Bias via Weighted Delayed Deep Deterministic Policy Gradient
Qiang HeXinwen Hou
2020-06-18
Generalized State-Dependent Exploration for Deep Reinforcement Learning in Robotics
| Antonin RaffinFreek Stulp
2020-05-12
PBCS : Efficient Exploration and Exploitation Using a Synergy between Reinforcement Learning and Motion Planning
Guillaume MatheronNicolas PerrinOlivier Sigaud
2020-04-24
Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning
Shangtong ZhangBo LiuShimon Whiteson
2020-04-22
Online Meta-Critic Learning for Off-Policy Actor-Critic Methods
| Wei ZhouYiying LiYongxin YangHuaimin WangTimothy M. Hospedales
2020-03-11
Interpretable End-to-end Urban Autonomous Driving with Latent Deep Reinforcement Learning
| Jianyu ChenShengbo Eben LiMasayoshi Tomizuka
2020-01-23
Ctrl-Z: Recovering from Instability in Reinforcement Learning
Vibhavari DasagiJake BruceThierry PeynotJürgen Leitner
2019-10-09
Composite Q-learning: Multi-scale Q-function Decomposition and Separable Optimization
Gabriel KalweitMaria HuegleJoschka Boedecker
2019-09-30
Proximal Distilled Evolutionary Reinforcement Learning
| Cristian BodnarBen DayPietro Lió
2019-06-24
Exploring Model-based Planning with Policy Networks
| Tingwu WangJimmy Ba
2019-06-20
Collaborative Evolutionary Reinforcement Learning
| Shauharda KhadkaSomdeb MajumdarTarek NassarZach DwielEvren TumerSantiago MiretYinyin LiuKagan Tumer
2019-05-02
CrossNorm: Normalization for Off-Policy TD Reinforcement Learning
Aditya BhattMax ArgusArtemij AmiranashviliThomas Brox
2019-02-14
Addressing Function Approximation Error in Actor-Critic Methods
| Scott FujimotoHerke van HoofDavid Meger
2018-02-26

Tasks

TASK PAPERS SHARE
Continuous Control 10 40.00%
OpenAI Gym 4 16.00%
Meta-Learning 2 8.00%
Curriculum Learning 1 4.00%
Feature Engineering 1 4.00%
Acrobot 1 4.00%
Decision Making 1 4.00%
Stock Market Prediction 1 4.00%
Efficient Exploration 1 4.00%

Components

COMPONENT TYPE
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories