no code implementations • 12 Jun 2023 • Haozhe Jiang, Qiwen Cui, Zhihan Xiong, Maryam Fazel, Simon S. Du
Specifically, we focus on games with bandit feedback, where testing an equilibrium can result in substantial regret even when the gap to be tested is small, and the existence of multiple optimal solutions (equilibria) in stationary games poses extra challenges.
1 code implementation • 31 May 2023 • Jianhao Wang, Jin Zhang, Haozhe Jiang, Junyu Zhang, LiWei Wang, Chongjie Zhang
We find a return-based uncertainty quantification for IDAQ that performs effectively.
1 code implementation • 14 Mar 2023 • Haozhe Jiang, Kaiyue Wen, Yilei Chen
For some settings we are also able to provide theories that explain the rationale of the design of our models.
no code implementations • 24 Oct 2022 • Haozhe Jiang, Qiwen Cui, Zhihan Xiong, Maryam Fazel, Simon S. Du
Starting from the facility-level (a. k. a., semi-bandit) feedback, we propose a novel one-unit deviation coverage condition and give a pessimism-type algorithm that can recover an approximate NE.
1 code implementation • NeurIPS 2021 • Jianhao Wang, Wenzhe Li, Haozhe Jiang, Guangxiang Zhu, Siyuan Li, Chongjie Zhang
These reverse imaginations provide informed data augmentation for model-free policy learning and enable conservative generalization beyond the offline dataset.