no code implementations • 23 Jan 2019 • Chao Gan, Jing Yang, Ruida Zhou, Cong Shen
We aim to show that when the user preferences are sufficiently diverse and each arm can be optimal for certain users, the O(log T) regret incurred by exploring the sub-optimal arms under the standard stochastic MAB setting can be reduced to a constant.
no code implementations • 22 May 2018 • Ruida Zhou, Chao Gan, Jing Yan, Cong Shen
For the online setting, we propose a Cost-aware Cas- cading Upper Confidence Bound (CC-UCB) algo- rithm, and show that the cumulative regret scales in O(log T ).
no code implementations • 11 Apr 2018 • Chao Gan, Ruida Zhou, Jing Yang, Cong Shen
Our objective is to understand how the costs and reward of the actions would affect the optimal behavior of the user in both offline and online settings, and design the corresponding opportunistic spectrum access strategies to maximize the expected cumulative net reward (i. e., reward-minus-cost).