Search Results for author: Canzhe Zhao

Found 6 papers, 2 papers with code

DPMAC: Differentially Private Communication for Cooperative Multi-Agent Reinforcement Learning

1 code implementation • 19 Aug 2023 • Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, Shuai Li

Communication lays the foundation for cooperation in human society and in multi-agent reinforcement learning (MARL).

Multi-agent Reinforcement Learning Privacy Preserving +1

Paper
Code

Best-of-three-worlds Analysis for Linear Bandits with Follow-the-regularized-leader Algorithm

no code implementations • 13 Mar 2023 • Fang Kong, Canzhe Zhao, Shuai Li

Follow-the-regularized-leader (FTRL) is another type of popular algorithm that can adapt to different environments.

Vocal Bursts Type Prediction

Paper
Add Code

Comparison-based Conversational Recommender System with Relative Bandit Feedback

1 code implementation • 21 Aug 2022 • Zhihui Xie, Tong Yu, Canzhe Zhao, Shuai Li

To enable users to provide comparative preferences during conversational interactions, we propose a novel comparison-based conversational recommender system.

Recommendation Systems

Paper
Code

Simultaneously Learning Stochastic and Adversarial Bandits under the Position-Based Model

no code implementations • 12 Jul 2022 • Cheng Chen, Canzhe Zhao, Shuai Li

This work studies the OLTR problem in both stochastic and adversarial environments under the position-based model (PBM).

Learning-To-Rank Position

Paper
Add Code

Differentially Private Temporal Difference Learning with Stochastic Nonconvex-Strongly-Concave Optimization

no code implementations • 25 Jan 2022 • Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, Shuai Li

Temporal difference (TD) learning is a widely used method to evaluate policies in reinforcement learning.

OpenAI Gym

Paper
Add Code

Conservative Contextual Combinatorial Cascading Bandit

no code implementations • 17 Apr 2021 • Kun Wang, Canzhe Zhao, Shuai Li, Shuo Shao

We propose the novel \emph{conservative contextual combinatorial cascading bandit ($C^4$-bandit)}, a cascading online learning game which incorporates the conservative mechanism.

Decision Making Recommendation Systems

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.