no code implementations • 15 Aug 2023 • Dirk van der Hoeven, Nikita Zhivotovskiy, Nicolò Cesa-Bianchi
Online learning methods yield sequential regret bounds under minimal assumptions and provide in-expectation risk bounds for statistical learning.
no code implementations • 3 Jul 2023 • Dirk van der Hoeven, Ciara Pike-Burke, Hao Qiu, Nicolo Cesa-Bianchi
Here, before making their prediction, each expert must be paid.
no code implementations • 30 May 2023 • Emmanuel Esposito, Saeed Masoudian, Hao Qiu, Dirk van der Hoeven, Nicolò Cesa-Bianchi, Yevgeny Seldin
However, if the mapping of states to losses is stochastic, we show that the regret grows at a rate of $\sqrt{\big(K+\min\{|\mathcal{S}|, d\}\big)T}$ (within log factors), implying that if the number $|\mathcal{S}|$ of states is smaller than the delay, then intermediate observations help.
no code implementations • 15 May 2023 • Dirk van der Hoeven, Lukas Zierahn, Tal Lancewicki, Aviv Rosenberg, Nicoló Cesa-Bianchi
We derive a new analysis of Follow The Regularized Leader (FTRL) for online learning with delayed bandit feedback.
no code implementations • 9 Oct 2022 • Emmanuel Esposito, Federico Fusco, Dirk van der Hoeven, Nicolò Cesa-Bianchi
The framework of feedback graphs is a generalization of sequential decision-making with bandit or full information feedback.
no code implementations • 6 Jun 2022 • Dirk van der Hoeven, Nikita Zhivotovskiy, Nicolò Cesa-Bianchi
We prove that a variant of EWA either achieves a negative regret (i. e., the algorithm outperforms the best expert), or guarantees a $O(\log K)$ bound on both variance and regret.
no code implementations • 1 Jun 2022 • Chloé Rouyer, Dirk van der Hoeven, Nicolò Cesa-Bianchi, Yevgeny Seldin
The algorithm combines ideas from the EXP3++ algorithm for stochastic and adversarial bandits and the EXP3. G algorithm for feedback graphs with a novel exploration scheme.
no code implementations • 2 Nov 2021 • Dirk van der Hoeven, Nicolò Cesa-Bianchi
We study nonstochastic bandits and experts in a delayed setting where delays depend on both time and arms.
no code implementations • NeurIPS 2021 • Dirk van der Hoeven, Federico Fusco, Nicolò Cesa-Bianchi
We study the problem of online multiclass classification in a setting where the learner's feedback is determined by an arbitrary directed graph.
no code implementations • 15 Feb 2021 • Dirk van der Hoeven, Hédi Hadiji, Tim van Erven
Each round, an adversary first activates one of the agents to issue a prediction and provides a corresponding gradient, and then the agents are allowed to send a $b$-bit message to their neighbors in the graph.
no code implementations • 12 Feb 2021 • Tim van Erven, Wouter M. Koolen, Dirk van der Hoeven
We provide a new adaptive method for online convex optimization, MetaGrad, that is robust to general convex losses but achieves faster rates for a broad class of special functions, including exp-concave and strongly convex functions, but also various types of stochastic and non-stochastic functions without any curvature.
no code implementations • NeurIPS 2020 • Dirk van der Hoeven
In the bandit classification setting we show that Gaptron is the first linear time algorithm with $O(K\sqrt{T})$ expected regret, where $K$ is the number of classes.
no code implementations • NeurIPS 2020 • Dirk van der Hoeven, Ashok Cutkosky, Haipeng Luo
We study bandit convex optimization methods that adapt to the norm of the comparator, a topic that has only been studied before for its full-information counterpart.
no code implementations • NeurIPS 2019 • Dirk van der Hoeven
In this paper we generalize this approach by allowing the provider of the data to choose the distribution of the noise without disclosing any parameters of the distribution to the learner, under the constraint that the distribution is symmetrical.
no code implementations • 21 Feb 2018 • Dirk van der Hoeven, Tim van Erven, Wojciech Kotłowski
A standard introduction to online learning might place Online Gradient Descent at its center and then proceed to develop generalizations and extensions like Online Mirror Descent and second-order methods.