no code implementations • 1 Jun 2022 • Chloé Rouyer, Dirk van der Hoeven, Nicolò Cesa-Bianchi, Yevgeny Seldin
The algorithm combines ideas from the EXP3++ algorithm for stochastic and adversarial bandits and the EXP3. G algorithm for feedback graphs with a novel exploration scheme.
no code implementations • 19 Feb 2021 • Chloé Rouyer, Yevgeny Seldin, Nicolò Cesa-Bianchi
In the stochastically constrained adversarial regime, which includes the stochastic regime as a special case, it achieves a regret bound of $O\left(\big((\lambda K)^{2/3} T^{1/3} + \ln T\big)\sum_{i \neq i^*} \Delta_i^{-1}\right)$, where $\Delta_i$ are the suboptimality gaps and $i^*$ is a unique optimal arm.