no code implementations • 2 May 2024 • Bingshan Hu, Zhiming Huang, Tianyue H. Zhang, Mathias Lécuyer, Nidhi Hegde
We study Thompson Sampling-based algorithms for stochastic bandits with bounded rewards.
no code implementations • 16 Feb 2021 • Bingshan Hu, Zhiming Huang, Nishant A. Mehta
Specifically, for the problem of decision-theoretic online learning with stochastic rewards, we present the first algorithm that achieves an $ O \left( \frac{ \log K}{ \Delta_{\min}} + \frac{\log(K) \min\{\log (\frac{1}{\Delta_{\min}}), \log(T)\}}{\epsilon} \right)$ regret bound, where $\Delta_{\min}$ is the minimum mean reward gap.
no code implementations • 14 May 2020 • Zhiming Huang, Yifan Xu, Bingshan Hu, QiPeng Wang, Jianping Pan
We study the combinatorial sleeping multi-armed semi-bandit problem with long-term fairness constraints~(CSMAB-F).