On the Estimation Bias in Double Q-Learning

1 Jan 2021 Anonymous

Double Q-learning is a classical method for reducing overestimation bias, which is caused by taking maximum estimated values in the Bellman operator. Its variants in the deep Q-learning paradigm have shown great promise in producing reliable value prediction and improving learning performance... (read more)

PDF Abstract
No code implementations yet. Submit your code now


Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper

Off-Policy TD Control
Double Q-learning
Off-Policy TD Control