Search Results for author: Julian Zimmert

Found 24 papers, 2 papers with code

Optimal cross-learning for contextual bandits with unknown context distributions

no code implementations NeurIPS 2023 Jon Schneider, Julian Zimmert

In this setting, we resolve an open problem of Balseiro et al. by providing an efficient algorithm with a nearly tight (up to logarithmic factors) regret bound of $\widetilde{O}(\sqrt{TK})$, independent of the number of contexts.

Multi-Armed Bandits

Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback

no code implementations17 Oct 2023 Haolin Liu, Chen-Yu Wei, Julian Zimmert

The first algorithm, although computationally inefficient, ensures a regret of $\widetilde{\mathcal{O}}\left(\sqrt{K}\right)$, where $K$ is the number of episodes.

An Improved Best-of-both-worlds Algorithm for Bandits with Delayed Feedback

no code implementations21 Aug 2023 Saeed Masoudian, Julian Zimmert, Yevgeny Seldin

Another major contribution is demonstrating that the complexity of best-of-both-worlds bandits with delayed feedback is characterized by the cumulative count of outstanding observations after skipping of observations with excessively large delays, rather than the delays or the maximal delay.

A Blackbox Approach to Best of Both Worlds in Bandits and Beyond

no code implementations20 Feb 2023 Christoph Dann, Chen-Yu Wei, Julian Zimmert

Best-of-both-worlds algorithms for online learning which achieve near-optimal regret in both the adversarial and the stochastic regimes have received growing attention recently.

Multi-Armed Bandits

Best of Both Worlds Policy Optimization

no code implementations18 Feb 2023 Christoph Dann, Chen-Yu Wei, Julian Zimmert

Then we show that under known transitions, we can further obtain a first-order regret bound in the adversarial regime by leveraging the log-barrier regularizer.

Refined Regret for Adversarial MDPs with Linear Function Approximation

no code implementations30 Jan 2023 Yan Dai, Haipeng Luo, Chen-Yu Wei, Julian Zimmert

This analysis allows the loss estimators to be arbitrarily negative and might be of independent interest.

A Unified Algorithm for Stochastic Path Problems

no code implementations17 Oct 2022 Christoph Dann, Chen-Yu Wei, Julian Zimmert

Our regret bound matches the best known results for the well-studied special case of stochastic shortest path (SSP) with all non-positive rewards.

A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback

no code implementations29 Jun 2022 Saeed Masoudian, Julian Zimmert, Yevgeny Seldin

We present a modified tuning of the algorithm of Zimmert and Seldin [2020] for adversarial multiarmed bandits with delayed feedback, which in addition to the minimax optimal adversarial regret guarantee shown by Zimmert and Seldin simultaneously achieves a near-optimal regret guarantee in the stochastic setting with fixed delays.

Stochastic Online Learning with Feedback Graphs: Finite-Time and Asymptotic Optimality

no code implementations20 Jun 2022 Teodor V. Marinov, Mehryar Mohri, Julian Zimmert

We revisit the problem of stochastic online learning with feedback graphs, with the goal of devising algorithms that are optimal, up to constants, both asymptotically and in finite time.

Pushing the Efficiency-Regret Pareto Frontier for Online Learning of Portfolios and Quantum States

no code implementations6 Feb 2022 Julian Zimmert, Naman Agarwal, Satyen Kale

This algorithm, called SCHRODINGER'S BISONS, is the first efficient algorithm with polylogarithmic regret for this more general problem.

A Model Selection Approach for Corruption Robust Reinforcement Learning

no code implementations7 Oct 2021 Chen-Yu Wei, Christoph Dann, Julian Zimmert

We develop a model selection approach to tackle reinforcement learning with adversarial corruption in both transition and reward.

Model Selection Multi-Armed Bandits +3

Efficient Methods for Online Multiclass Logistic Regression

no code implementations6 Oct 2021 Naman Agarwal, Satyen Kale, Julian Zimmert

Previous work (Foster et al., 2018) has highlighted the importance of improper predictors for achieving "fast rates" in the online multiclass logistic regression problem without suffering exponentially from secondary problem parameters, such as the norm of the predictors in the comparison class.

regression

Adapting to Misspecification in Contextual Bandits

no code implementations NeurIPS 2020 Dylan J. Foster, Claudio Gentile, Mehryar Mohri, Julian Zimmert

Given access to an online oracle for square loss regression, our algorithm attains optimal regret and -- in particular -- optimal dependence on the misspecification level, with no prior knowledge.

Multi-Armed Bandits regression

Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning

no code implementations NeurIPS 2021 Christoph Dann, Teodor V. Marinov, Mehryar Mohri, Julian Zimmert

Our results show that optimistic algorithms can not achieve the information-theoretic lower bounds even in deterministic MDPs unless there is a unique optimal policy.

reinforcement-learning Reinforcement Learning (RL)

Model Selection in Contextual Stochastic Bandit Problems

no code implementations NeurIPS 2020 Aldo Pacchiano, My Phan, Yasin Abbasi-Yadkori, Anup Rao, Julian Zimmert, Tor Lattimore, Csaba Szepesvari

Our methods rely on a novel and generic smoothing transformation for bandit algorithms that permits us to obtain optimal $O(\sqrt{T})$ model selection guarantees for stochastic contextual bandit problems as long as the optimal base algorithm satisfies a high probability regret guarantee.

Model Selection Multi-Armed Bandits

Online Learning for Active Cache Synchronization

1 code implementation ICML 2020 Andrey Kolobov, Sébastien Bubeck, Julian Zimmert

Existing multi-armed bandit (MAB) models make two implicit assumptions: an arm generates a payoff only when it is played, and the agent observes every payoff that is generated.

An Optimal Algorithm for Adversarial Bandits with Arbitrary Delays

no code implementations14 Oct 2019 Julian Zimmert, Yevgeny Seldin

The result requires no advance knowledge of the delays and resolves an open problem of Thune et al. (2019).

Multi-Armed Bandits

Connections Between Mirror Descent, Thompson Sampling and the Information Ratio

no code implementations NeurIPS 2019 Julian Zimmert, Tor Lattimore

The information-theoretic analysis by Russo and Van Roy (2014) in combination with minimax duality has proved a powerful tool for the analysis of online learning algorithms in full and partial information settings.

Thompson Sampling

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously

no code implementations25 Jan 2019 Julian Zimmert, Haipeng Luo, Chen-Yu Wei

We develop the first general semi-bandit algorithm that simultaneously achieves $\mathcal{O}(\log T)$ regret for stochastic environments and $\mathcal{O}(\sqrt{T})$ regret for adversarial environments without knowledge of the regime or the number of rounds $T$.

Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits

no code implementations19 Jul 2018 Julian Zimmert, Yevgeny Seldin

More generally, we define an adversarial regime with a self-bounding constraint, which includes stochastic regime, stochastically constrained adversarial regime (Wei and Luo), and stochastic regime with adversarial corruptions (Lykouris et al.) as special cases, and show that the algorithm achieves logarithmic regret guarantee in this regime and all of its special cases simultaneously with the adversarial regret guarantee.}

Multi-Armed Bandits Thompson Sampling

Factored Bandits

no code implementations NeurIPS 2018 Julian Zimmert, Yevgeny Seldin

We introduce the factored bandits model, which is a framework for learning with limited (bandit) feedback, where actions can be decomposed into a Cartesian product of atomic actions.

Distributed Optimization of Multi-Class SVMs

1 code implementation25 Nov 2016 Maximilian Alber, Julian Zimmert, Urun Dogan, Marius Kloft

Training of one-vs.-rest SVMs can be parallelized over the number of classes in a straight forward way.

Distributed Optimization General Classification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.