Although Q-learning is one of the most successful algorithms for finding the best action-value function (and thus the optimal policy) in reinforcement learning, its implementation often suffers from large overestimation of Q-function values incurred by random sampling. The double Q-learning algorithm proposed in~\citet{hasselt2010double} overcomes such an overestimation issue by randomly switching the update between two Q-estimators, and has thus gained significant popularity in practice... (read more)

PDF
Add Datasets
introduced or used in this paper

Submit
results from this paper
to get state-of-the-art GitHub badges and help the
community compare results to other papers.