no code implementations • 27 Feb 2024 • Jincheng Mei, Zixin Zhong, Bo Dai, Alekh Agarwal, Csaba Szepesvari, Dale Schuurmans
We show that the \emph{stochastic gradient} bandit algorithm converges to a \emph{globally optimal} policy at an $O(1/t)$ rate, even with a \emph{constant} step size.
1 code implementation • 31 Jan 2023 • Yunlong Hou, Vincent Y. F. Tan, Zixin Zhong
Under this constraint, we design and analyze an algorithm {\sc PASCombUCB} that minimizes the regret over the horizon of time $T$.
1 code implementation • 23 Oct 2022 • Yi Wei, Zixin Zhong, Vincent Y. F. Tan
The beam alignment (BA) problem consists in accurately aligning the transmitter and receiver beams to establish a reliable communication link in wireless communication systems.
no code implementations • 9 Feb 2022 • Junwen Yang, Zixin Zhong, Vincent Y. F. Tan
This paper considers the problem of online clustering with bandit feedback.
1 code implementation • 25 Jan 2022 • Yunlong Hou, Vincent Y. F. Tan, Zixin Zhong
We design and analyze VA-LUCB, a parameter-free algorithm, for identifying the best arm under the fixed-confidence setup and under a stringent constraint that the variance of the chosen arm is strictly smaller than a given threshold.
no code implementations • 16 Oct 2021 • Zixin Zhong, Wang Chi Cheung, Vincent Y. F. Tan
We study the Pareto frontier of two archetypal objectives in multi-armed bandits, namely, regret minimization (RM) and best arm identification (BAI) with a fixed horizon.
1 code implementation • 15 Oct 2020 • Zixin Zhong, Wang Chi Cheung, Vincent Y. F. Tan
When the amount of corruptions per step (CPS) is below a threshold, PSS($u$) identifies the best arm or item with probability tending to $1$ as $T\rightarrow \infty$.
no code implementations • ICML 2020 • Zixin Zhong, Wang Chi Cheung, Vincent Y. F. Tan
Finally, extensive numerical simulations corroborate the efficacy of CascadeBAI as well as the tightness of our upper bound on its time complexity.
no code implementations • 2 Oct 2018 • Zixin Zhong, Wang Chi Cheung, Vincent Y. F. Tan
While Thompson sampling (TS) algorithms have been shown to be empirically superior to Upper Confidence Bound (UCB) algorithms for cascading bandits, theoretical guarantees are only known for the latter.