no code implementations • 28 Sep 2023 • Shubhada Agrawal, Timothée Mathieu, Debabrota Basu, Odalric-Ambrym Maillard
In this setting, accommodating potentially unbounded corruptions, we establish a problem-dependent lower bound on regret for a given family of arm distributions.
no code implementations • 19 Jun 2023 • Timothée Mathieu, Riccardo Della Vecchia, Alena Shilova, Matheus Medeiros Centa, Hector Kohler, Odalric-Ambrym Maillard, Philippe Preux
When comparing several RL algorithms, a major question is how many executions must be made and how can we ensure that the results of such a comparison are theoretically sound.
no code implementations • 7 Mar 2022 • Debabrota Basu, Odalric-Ambrym Maillard, Timothée Mathieu
We study the corrupted bandit problem, i. e. a stochastic multi-armed bandit problem with $k$ unknown reward distributions, which are heavy-tailed and corrupted by a history-independent adversary or Nature.
no code implementations • 16 Oct 2019 • Stanislav Minsker, Timothée Mathieu
We propose a version of empirical risk minimization based on the idea of replacing sample averages by robust proxies of the expectation, and obtain high-confidence bounds for the excess risk of resulting estimators.