no code implementations • NeurIPS 2017 • Qinshi Wang, Wei Chen
Finally, we provide lower bound results showing that the factor $1/p^*$ is unavoidable for general CMAB-T problems, suggesting that the TPM condition is crucial in removing this factor.
no code implementations • 31 Jul 2014 • Wei Chen, Yajun Wang, Yang Yuan, Qinshi Wang
The objective of an online learning algorithm for CMAB is to minimize (\alpha,\beta)-approximation regret, which is the difference between the \alpha{\beta} fraction of the expected reward when always playing the optimal super arm, and the expected reward of playing super arms according to the algorithm.