Search Results for author: Dirk van der Hoeven

Found 15 papers, 0 papers with code

High-Probability Risk Bounds via Sequential Predictors

no code implementations • 15 Aug 2023 • Dirk van der Hoeven, Nikita Zhivotovskiy, Nicolò Cesa-Bianchi

Online learning methods yield sequential regret bounds under minimal assumptions and provide in-expectation risk bounds for statistical learning.

Density Estimation regression

Paper
Add Code

Trading-Off Payments and Accuracy in Online Classification with Paid Stochastic Experts

no code implementations • 3 Jul 2023 • Dirk van der Hoeven, Ciara Pike-Burke, Hao Qiu, Nicolo Cesa-Bianchi

Here, before making their prediction, each expert must be paid.

Paper
Add Code

Delayed Bandits: When Do Intermediate Observations Help?

no code implementations • 30 May 2023 • Emmanuel Esposito, Saeed Masoudian, Hao Qiu, Dirk van der Hoeven, Nicolò Cesa-Bianchi, Yevgeny Seldin

However, if the mapping of states to losses is stochastic, we show that the regret grows at a rate of $\sqrt{\big(K+\min\{|\mathcal{S}|, d\}\big)T}$ (within log factors), implying that if the number $|\mathcal{S}|$ of states is smaller than the delay, then intermediate observations help.

Paper
Add Code

A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs

no code implementations • 15 May 2023 • Dirk van der Hoeven, Lukas Zierahn, Tal Lancewicki, Aviv Rosenberg, Nicoló Cesa-Bianchi

We derive a new analysis of Follow The Regularized Leader (FTRL) for online learning with delayed bandit feedback.

Paper
Add Code

Learning on the Edge: Online Learning with Stochastic Feedback Graphs

no code implementations • 9 Oct 2022 • Emmanuel Esposito, Federico Fusco, Dirk van der Hoeven, Nicolò Cesa-Bianchi

The framework of feedback graphs is a generalization of sequential decision-making with bandit or full information feedback.

Decision Making

Paper
Add Code

A Regret-Variance Trade-Off in Online Learning

no code implementations • 6 Jun 2022 • Dirk van der Hoeven, Nikita Zhivotovskiy, Nicolò Cesa-Bianchi

We prove that a variant of EWA either achieves a negative regret (i. e., the algorithm outperforms the best expert), or guarantees a $O(\log K)$ bound on both variance and regret.

Model Selection

Paper
Add Code

A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs

no code implementations • 1 Jun 2022 • Chloé Rouyer, Dirk van der Hoeven, Nicolò Cesa-Bianchi, Yevgeny Seldin

The algorithm combines ideas from the EXP3++ algorithm for stochastic and adversarial bandits and the EXP3. G algorithm for feedback graphs with a novel exploration scheme.

Decision Making

Paper
Add Code

Nonstochastic Bandits and Experts with Arm-Dependent Delays

no code implementations • 2 Nov 2021 • Dirk van der Hoeven, Nicolò Cesa-Bianchi

We study nonstochastic bandits and experts in a delayed setting where delays depend on both time and arms.

Paper
Add Code

Beyond Bandit Feedback in Online Multiclass Classification

no code implementations • NeurIPS 2021 • Dirk van der Hoeven, Federico Fusco, Nicolò Cesa-Bianchi

We study the problem of online multiclass classification in a setting where the learner's feedback is determined by an arbitrary directed graph.

2k Classification

Paper
Add Code

Distributed Online Learning for Joint Regret with Communication Constraints

no code implementations • 15 Feb 2021 • Dirk van der Hoeven, Hédi Hadiji, Tim van Erven

Each round, an adversary first activates one of the agents to issue a prediction and provides a corresponding gradient, and then the agents are allowed to send a $b$-bit message to their neighbors in the graph.

Paper
Add Code

MetaGrad: Adaptation using Multiple Learning Rates in Online Learning

no code implementations • 12 Feb 2021 • Tim van Erven, Wouter M. Koolen, Dirk van der Hoeven

We provide a new adaptive method for online convex optimization, MetaGrad, that is robust to general convex losses but achieves faster rates for a broad class of special functions, including exp-concave and strongly convex functions, but also various types of stochastic and non-stochastic functions without any curvature.

Paper
Add Code

Exploiting the Surrogate Gap in Online Multiclass Classification

no code implementations • NeurIPS 2020 • Dirk van der Hoeven

In the bandit classification setting we show that Gaptron is the first linear time algorithm with $O(K\sqrt{T})$ expected regret, where $K$ is the number of classes.

Classification General Classification

Paper
Add Code

Comparator-adaptive Convex Bandits

no code implementations • NeurIPS 2020 • Dirk van der Hoeven, Ashok Cutkosky, Haipeng Luo

We study bandit convex optimization methods that adapt to the norm of the comparator, a topic that has only been studied before for its full-information counterpart.

Paper
Add Code

User-Specified Local Differential Privacy in Unconstrained Adaptive Online Learning

no code implementations • NeurIPS 2019 • Dirk van der Hoeven

In this paper we generalize this approach by allowing the provider of the data to choose the distribution of the noise without disclosing any parameters of the distribution to the learner, under the constraint that the distribution is symmetrical.

Paper
Add Code

The Many Faces of Exponential Weights in Online Learning

no code implementations • 21 Feb 2018 • Dirk van der Hoeven, Tim van Erven, Wojciech Kotłowski

A standard introduction to online learning might place Online Gradient Descent at its center and then proceed to develop generalizations and extensions like Online Mirror Descent and second-order methods.

Second-order methods

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.