Double Q-learning: New Analysis and Sharper Finite-time Bound

1 Jan 2021  ·  Lin Zhao, Huaqing Xiong, Yingbin Liang, Wei zhang ·

Double Q-learning (Hasselt 2010) has gained significant success in practice due to its effectiveness in overcoming the overestimation issue of Q-learning. However, theoretical understanding of double Q-learning is rather limited and the only existing finite-time analysis was recently established in (Xiong et al. 2020) under a polynomial learning rate. This paper analyzes the more challenging case with a rescaled linear learning rate for which the previous method does not appear to be applicable. We develop new analytical tools that achieve an order-level better finite-time convergence rate than the previously established result. Specifically, we show that synchronous double Q-learning attains an $\epsilon$-accurate global optimum with a time complexity of $\Omega\left(\frac{\ln D}{(1-\gamma)^6\epsilon^2} + \frac{\sqrt{\ln D}}{(1-\gamma)^7\epsilon^2} \right)$, and the asynchronous algorithm attains a time complexity of $\Omega\left(\frac{L^6\sqrt{\ln D}}{(1-\gamma)^7\epsilon^2} \right)$, where $D$ is the cardinality of the state-action space, $\gamma$ is the discount factor, and $L$ is a parameter related to the sampling strategy for asynchronous double Q-learning. These results improve the order-level dependence of the convergence rate on all major parameters $(\epsilon,1-\gamma, D, L)$ provided in (Xiong et al. 2020). The new analysis provided in this paper presents a more direct and succinct approach for characterizing the finite-time convergence rate of Double Q-learning.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods