no code implementations • NeurIPS 2021 • Lin Zhao, Huaqing Xiong, Yingbin Liang
This paper tackles the more challenging case of a constant learning rate, and develops new analytical tools that improve the existing convergence rate by orders of magnitude.
no code implementations • 1 Jan 2021 • Lin Zhao, Huaqing Xiong, Yingbin Liang, Wei zhang
Double Q-learning (Hasselt 2010) has gained significant success in practice due to its effectiveness in overcoming the overestimation issue of Q-learning.
no code implementations • NeurIPS 2020 • Huaqing Xiong, Lin Zhao, Yingbin Liang, Wei zhang
Although Q-learning is one of the most successful algorithms for finding the best action-value function (and thus the optimal policy) in reinforcement learning, its implementation often suffers from large overestimation of Q-function values incurred by random sampling.
no code implementations • 30 Jul 2020 • Bowen Weng, Huaqing Xiong, Lin Zhao, Yingbin Liang, Wei zhang
For the infinite state-action space case, we establish the convergence guarantee for MomentumQ with linear function approximations and Markovian sampling.
no code implementations • 15 Jul 2020 • Bowen Weng, Huaqing Xiong, Yingbin Liang, Wei zhang
In this paper, we first characterize the convergence rate for Q-AMSGrad, which is the Q-learning algorithm with AMSGrad update (a commonly adopted alternative of Adam for theoretical analysis).
no code implementations • 15 Feb 2020 • Huaqing Xiong, Tengyu Xu, Yingbin Liang, Wei zhang
Despite the wide applications of Adam in reinforcement learning (RL), the theoretical convergence of Adam-type RL algorithms has not been established.
no code implementations • 25 Sep 2019 • Bowen Weng, Huaqing Xiong, Yingbin Liang, Wei zhang
Differently from the popular Deep Q-Network (DQN) learning, Alternating Q-learning (AltQ) does not fully fit a target Q-function at each iteration, and is generally known to be unstable and inefficient.
no code implementations • 7 May 2019 • Bowen Weng, Huaqing Xiong, Wei zhang
This paper studies accelerations in Q-learning algorithms.