no code implementations • 9 Mar 2024 • Yang Peng, Liangyu Zhang, Zhihua Zhang
In the tabular case, \citet{rowland2018analysis} and \citet{rowland2023analysis} proved the asymptotic convergence of two instances of distributional TD, namely categorical temporal difference algorithm (CTD) and quantile temporal difference algorithm (QTD), respectively.
1 code implementation • 29 Sep 2023 • Liangyu Zhang, Yang Peng, Jiadong Liang, Wenhao Yang, Zhihua Zhang
This implies the distributional policy evaluation problem can be solved with sample efficiency.
Distributional Reinforcement Learning reinforcement-learning
1 code implementation • 29 Apr 2023 • Liangyu Zhang, Yang Peng, Wenhao Yang, Zhihua Zhang
To the best of our knowledge, we are the first to apply tools from semi-infinitely programming (SIP) to solve constrained reinforcement learning problems.
no code implementations • 12 Sep 2022 • Miao Lu, Wenhao Yang, Liangyu Zhang, Zhihua Zhang
Specifically, we propose a two-stage estimator based on the instrumental variables and establish its statistical properties in the confounded MDPs with a linear structure.
no code implementations • 9 May 2021 • Wenhao Yang, Liangyu Zhang, Zhihua Zhang
In this paper, we study the non-asymptotic and asymptotic performances of the optimal robust policy and value function of robust Markov Decision Processes(MDPs), where the optimal robust policy and value function are solved only from a generative model.
no code implementations • 1 Jan 2021 • Jiadong Liang, Liangyu Zhang, Cheng Zhang, Zhihua Zhang
In this paper we propose a novel approach for stabilizing the training process of Generative Adversarial Networks as well as alleviating the mode collapse problem.
no code implementations • 9 Aug 2020 • Jiadong Liang, Liangyu Zhang, Cheng Zhang, Zhihua Zhang
In this paper we propose a novel approach for stabilizing the training process of Generative Adversarial Networks as well as alleviating the mode collapse problem.