Decorrelated Double Q-learning

12 Jun 2020 Gang Chen

Q-learning with value function approximation may have the poor performance because of overestimation bias and imprecise estimate. Specifically, overestimation bias is from the maximum operator over noise estimate, which is exaggerated using the estimate of a subsequent state... (read more)

PDF Abstract
No code implementations yet. Submit your code now

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
Double Q-learning
Off-Policy TD Control
Q-Learning
Off-Policy TD Control