Methods > Reinforcement Learning > Off-Policy TD Control

Clipped Double Q-learning

Introduced by Fujimoto et al. in Addressing Function Approximation Error in Actor-Critic Methods

Clipped Double Q-learning is a variant on Double Q-learning that upper-bounds the less biased Q estimate $Q_{\theta_{2}}$ by the biased estimate $Q_{\theta_{1}}$. This is equivalent to taking the minimum of the two estimates, resulting in the following target update:

$$ y_{1} = r + \gamma\min_{i=1,2}Q_{\theta'_{i}}\left(s', \pi_{\phi_{1}}\left(s'\right)\right) $$

The motivation for this extension is that vanilla double Q-learning is sometimes ineffective if the target and current networks are too similar, e.g. with a slow-changing policy in an actor-critic framework.

Source: Addressing Function Approximation Error in Actor-Critic Methods

Latest Papers

PAPER DATE
Learning to Play Soccer From Scratch: Sample-Efficient Emergent Coordination through Curriculum-Learning and Competition
Pavan SamtaniFrancisco LeivaJavier Ruiz-del-Solar
2021-03-09
Application of twin delayed deep deterministic policy gradient learning for the control of transesterification process
Tanuja JoshiShikhar MakkerHariprasad KodamanaHarikumar Kandath
2021-02-25
Memory-based Deep Reinforcement Learning for POMDP
Lingheng MengRob GorbetDana Kulić
2021-02-24
OffCon$^3$: What is state of the art anyway?
| Philip J. BallStephen J. Roberts
2021-01-27
Learning Synthetic Environments for Reinforcement Learning with Evolution Strategies
| Fabio FerreiraThomas NierhoffFrank Hutter
2021-01-24
GST: Group-Sparse Training for Accelerating Deep Reinforcement Learning
JuHyoung LeeSangyeob KimSangjin KimWooyoung JoHoi-jun Yoo
2021-01-24
PGPS : Coupling Policy Gradient with Population-based Search
Anonymous
2021-01-01
Policy Gradient RL Algorithms as Directed Acyclic Graphs
| Juan Jose Garau Luis
2020-12-14
OPAC: Opportunistic Actor-Critic
Srinjoy RoySaptam BakshiTamal Maharaj
2020-12-11
Efficient Reservoir Management through Deep Reinforcement Learning
Xinrun WangTarun NairHaoyang LiYuh Sheng Reuben WongNachiket KelkarSrinivas VaidyanathanRajat NayakBo AnJagdish KrishnaswamyMilind Tambe
2020-12-07
FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance
| Xiao-Yang LiuHongyang YangQian ChenRunjia ZhangLiuqing YangBowen XiaoChristina Dan Wang
2020-11-19
Tonic: A Deep Reinforcement Learning Library for Fast Prototyping and Benchmarking
| Fabio Pardo
2020-11-15
RealAnt: An Open-Source Low-Cost Quadruped for Research in Real-World Reinforcement Learning
| Rinu BoneyJussi SainioMikko KaivolaArno SolinJuho Kannala
2020-11-05
Hindsight Experience Replay with Kronecker Product Approximate Curvature
Dhuruva Priyan G MAbhik SinglaShalabh Bhatnagar
2020-10-09
Sample-Efficient Automated Deep Reinforcement Learning
| Jörg K. H. FrankeGregor KöhlerAndré BiedenkappFrank Hutter
2020-09-03
Collision Avoidance Robotics Via Meta-Learning (CARML)
| Abhiram IyerAravind Mahadevan
2020-07-16
Regularly Updated Deterministic Policy Gradient Algorithm
Shuai HanWenbo ZhouShuai LüJiayu Yu
2020-07-01
Noise, overestimation and exploration in Deep Reinforcement Learning
Rafael Stekolshchik
2020-06-25
Reducing Estimation Bias via Weighted Delayed Deep Deterministic Policy Gradient
Qiang HeXinwen Hou
2020-06-18
Generalized State-Dependent Exploration for Deep Reinforcement Learning in Robotics
| Antonin RaffinFreek Stulp
2020-05-12
PBCS : Efficient Exploration and Exploitation Using a Synergy between Reinforcement Learning and Motion Planning
Guillaume MatheronNicolas PerrinOlivier Sigaud
2020-04-24
Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning
Shangtong ZhangBo LiuShimon Whiteson
2020-04-22
Online Meta-Critic Learning for Off-Policy Actor-Critic Methods
| Wei ZhouYiying LiYongxin YangHuaimin WangTimothy M. Hospedales
2020-03-11
Interpretable End-to-end Urban Autonomous Driving with Latent Deep Reinforcement Learning
| Jianyu ChenShengbo Eben LiMasayoshi Tomizuka
2020-01-23
Ctrl-Z: Recovering from Instability in Reinforcement Learning
Vibhavari DasagiJake BruceThierry PeynotJürgen Leitner
2019-10-09
Composite Q-learning: Multi-scale Q-function Decomposition and Separable Optimization
Gabriel KalweitMaria HuegleJoschka Boedecker
2019-09-30
Proximal Distilled Evolutionary Reinforcement Learning
| Cristian BodnarBen DayPietro Lió
2019-06-24
Exploring Model-based Planning with Policy Networks
| Tingwu WangJimmy Ba
2019-06-20
Collaborative Evolutionary Reinforcement Learning
| Shauharda KhadkaSomdeb MajumdarTarek NassarZach DwielEvren TumerSantiago MiretYinyin LiuKagan Tumer
2019-05-02
CrossNorm: Normalization for Off-Policy TD Reinforcement Learning
Aditya BhattMax ArgusArtemij AmiranashviliThomas Brox
2019-02-14
Addressing Function Approximation Error in Actor-Critic Methods
| Scott FujimotoHerke van HoofDavid Meger
2018-02-26

Tasks

TASK PAPERS SHARE
Continuous Control 10 40.00%
OpenAI Gym 4 16.00%
Meta-Learning 2 8.00%
Curriculum Learning 1 4.00%
Feature Engineering 1 4.00%
Acrobot 1 4.00%
Decision Making 1 4.00%
Stock Market Prediction 1 4.00%
Efficient Exploration 1 4.00%

Components

COMPONENT TYPE
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories