no code implementations • NeurIPS 2010 • Hado V. Hasselt
We apply the double estimator to Q-learning to construct Double Q-learning, a new off-policy reinforcement learning algorithm.
Q-Learning reinforcement-learning +1