Finite-Time Analysis for Double Q-learning

Although Q-learning is one of the most successful algorithms for finding the best action-value function (and thus the optimal policy) in reinforcement learning, its implementation often suffers from large overestimation of Q-function values incurred by random sampling. The double Q-learning algorithm proposed in~\citet{hasselt2010double} overcomes such an overestimation issue by randomly switching the update between two Q-estimators, and has thus gained significant popularity in practice... (read more)

PDF Abstract NeurIPS 2020 PDF NeurIPS 2020 Abstract
No code implementations yet. Submit your code now


  Add Datasets introduced or used in this paper

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper

Off-Policy TD Control
Double Q-learning
Off-Policy TD Control