no code implementations • 24 Jul 2017 • A. Ömer Sarıtaç, Cem Tekin
Under the assumption that the arm triggering probabilities (ATPs) are positive for all arms, we prove that a class of upper confidence bound (UCB) policies, named Combinatorial UCB with exploration rate $\kappa$ (CUCB-$\kappa$), and Combinatorial Thompson Sampling (CTS), which estimates the expected states of the arms via Thompson sampling, achieve bounded regret.