1 code implementation • 19 Aug 2023 • Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, Shuai Li
Communication lays the foundation for cooperation in human society and in multi-agent reinforcement learning (MARL).
no code implementations • 13 Mar 2023 • Fang Kong, Canzhe Zhao, Shuai Li
Follow-the-regularized-leader (FTRL) is another type of popular algorithm that can adapt to different environments.
1 code implementation • 21 Aug 2022 • Zhihui Xie, Tong Yu, Canzhe Zhao, Shuai Li
To enable users to provide comparative preferences during conversational interactions, we propose a novel comparison-based conversational recommender system.
no code implementations • 12 Jul 2022 • Cheng Chen, Canzhe Zhao, Shuai Li
This work studies the OLTR problem in both stochastic and adversarial environments under the position-based model (PBM).
no code implementations • 25 Jan 2022 • Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, Shuai Li
Temporal difference (TD) learning is a widely used method to evaluate policies in reinforcement learning.
no code implementations • 17 Apr 2021 • Kun Wang, Canzhe Zhao, Shuai Li, Shuo Shao
We propose the novel \emph{conservative contextual combinatorial cascading bandit ($C^4$-bandit)}, a cascading online learning game which incorporates the conservative mechanism.