no code implementations • 20 Oct 2023 • Ruiquan Huang, Yuan Cheng, Jing Yang, Vincent Tan, Yingbin Liang
To this end, we posit a joint model class for tasks and use the notion of $\eta$-bracketing number to quantify its complexity; this number also serves as a general metric to capture the similarity of tasks and thus determines the benefit of multi-task over single-task RL.
no code implementations • 21 Aug 2023 • Xi Li, Songhe Wang, Ruiquan Huang, Mahanth Gowda, George Kesidis
Although there are extensive studies on backdoor attacks against image data, the susceptibility of video-based systems under backdoor attacks remains largely unexplored.
no code implementations • 1 Jul 2023 • Ruiquan Huang, Yingbin Liang, Jing Yang
The general sequential decision-making problem, which includes Markov decision processes (MDPs) and partially observable MDPs (POMDPs) as special cases, aims at maximizing a cumulative reward by making a sequence of decisions based on a history of observations and actions over time.
no code implementations • 14 Jun 2023 • Xizixiang Wei, Tianhao Wang, Ruiquan Huang, Cong Shen, Jing Yang, H. Vincent Poor
A new FL convergence bound is derived which, combined with the privacy guarantees, allows for a smooth tradeoff between the achieved convergence rate and differential privacy levels.
no code implementations • 9 Jun 2023 • Donghao Li, Ruiquan Huang, Cong Shen, Jing Yang
This paper investigates conservative exploration in reinforcement learning where the performance of the learning agent is guaranteed to be above a certain threshold throughout the learning process.
no code implementations • 8 Jun 2023 • Ruiquan Huang, Huanyu Zhang, Luca Melis, Milan Shen, Meisam Hajzinia, Jing Yang
This paper studies federated linear contextual bandits under the notion of user-level differential privacy (DP).
no code implementations • 1 Jun 2023 • Songtao Feng, Ming Yin, Ruiquan Huang, Yu-Xiang Wang, Jing Yang, Yingbin Liang
To the best of our knowledge, this is the first dynamic regret analysis in non-stationary MDPs with general function approximation.
no code implementations • 20 Mar 2023 • Yuan Cheng, Ruiquan Huang, Jing Yang, Yingbin Liang
In this work, we first provide the first known sample complexity lower bound that holds for any algorithm under low-rank MDPs.
no code implementations • 28 Jun 2022 • Ruiquan Huang, Jing Yang, Yingbin Liang
In particular, we consider the scenario where a safe baseline policy is known beforehand, and propose a unified Safe reWard-frEe ExploraTion (SWEET) framework.
no code implementations • NeurIPS 2021 • Ruiquan Huang, Weiqiang Wu, Jing Yang, Cong Shen
This paper presents a novel federated linear contextual bandits model, where individual clients face different $K$-armed stochastic bandits coupled through common global parameters.