no code implementations • 23 Feb 2024 • Gergely Neu, Matteo Papini, Ludovic Schwartz
We study the problem of online learning in contextual bandit problems where the loss function is assumed to belong to a known parametric function class.
no code implementations • 21 Feb 2024 • Gergely Neu, Nneka Okolo
We study the performance of stochastic first-order methods for finding saddle points of convex-concave functions.
no code implementations • 2 Oct 2023 • Gergely Neu, Julia Olkhovskaya, Sattar Vakili
We study a generalization of the problem of online learning in adversarial linear contextual bandits by incorporating loss functions that belong to a reproducing kernel Hilbert space, which allows for a more flexible modeling of complex decision-making scenarios.
no code implementations • 27 Sep 2023 • Germano Gabbianelli, Gergely Neu, Matteo Papini
These improvements are made possible by the observation that the upper and lower tails importance-weighted estimators behave very differently from each other, and their careful control can massively improve on previous results that were all based on symmetric two-sided concentration inequalities.
no code implementations • 31 May 2023 • Gábor Lugosi, Gergely Neu
We establish a connection between the online and statistical learning setting by showing that the existence of an online learning algorithm with bounded regret in this game implies a bound on the generalization error of the statistical learning algorithm, up to a martingale concentration term that is independent of the complexity of the statistical learning method.
no code implementations • 22 May 2023 • Germano Gabbianelli, Gergely Neu, Nneka Okolo, Matteo Papini
Offline Reinforcement Learning (RL) aims to learn a near-optimal policy from a fixed dataset of transitions collected by another policy.
no code implementations • 27 Feb 2023 • Antoine Moulin, Gergely Neu
We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure.
no code implementations • 21 Oct 2022 • Gergely Neu, Nneka Okolo
We propose a new stochastic primal-dual optimization algorithm for planning in a large discounted Markov decision process with a generative model and linear function approximation.
no code implementations • 17 Oct 2022 • Fan Lu, Prashant Mehta, Sean Meyn, Gergely Neu
The main contributions follow: (i) The dual of convex Q-learning is not precisely Manne's LP or a version of logistic Q-learning, but has similar structure that reveals the need for regularization to avoid over-fitting.
2 code implementations • 22 Sep 2022 • Luca Viano, Angeliki Kamoutsi, Gergely Neu, Igor Krawczuk, Volkan Cevher
Thanks to PPM, we avoid nested policy evaluation and cost updates for online IL appearing in the prior literature.
no code implementations • 18 Jul 2022 • Germano Gabbianelli, Matteo Papini, Gergely Neu
We study the problem of online learning in adversarial bandit problems under a partial observability model called off-policy feedback.
no code implementations • 27 May 2022 • Gergely Neu, Julia Olkhovskaya, Matteo Papini, Ludovic Schwartz
We study the Bayesian regret of the renowned Thompson Sampling algorithm in contextual bandits with binary losses and adversarially-selected contexts.
no code implementations • 10 Feb 2022 • Gábor Lugosi, Gergely Neu
Since the celebrated works of Russo and Zou (2016, 2019) and Xu and Raginsky (2017), it has been well known that the generalization error of supervised learning algorithms can be bounded in terms of the mutual information between their input and the output, given that the loss of any fixed hypothesis has a subgaussian tail.
no code implementations • 28 Dec 2021 • Mastane Achab, Gergely Neu
In dynamic programming (DP) and reinforcement learning (RL), an agent learns to act optimally in terms of expected long-term return by sequentially interacting with its environment modeled by a Markov decision process (MDP).
no code implementations • 24 Sep 2021 • Gábor Lugosi, Gergely Neu, Julia Olkhovskaya
The goal of the decision maker is to select the sequence of agents in a way that the total number of influenced nodes in the network.
no code implementations • NeurIPS 2021 • Gergely Neu, Julia Olkhovskaya
We consider the problem of online learning in an episodic Markov decision process, where the reward function is allowed to change between episodes in an adversarial manner and the learner only observes the rewards associated with its actions.
no code implementations • 1 Feb 2021 • Gergely Neu, Gintare Karolina Dziugaite, Mahdi Haghifam, Daniel M. Roy
The key factors our bounds depend on are the variance of the gradients (with respect to the data distribution) and the local smoothness of the objective function along the SGD path, and the sensitivity of the loss function to perturbations to the final output.
no code implementations • 21 Oct 2020 • Joan Bas-Serrano, Sebastian Curi, Andreas Krause, Gergely Neu
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
no code implementations • NeurIPS 2020 • Gergely Neu, Ciara Pike-Burke
The principle of optimism in the face of uncertainty underpins many theoretically successful reinforcement learning algorithms.
no code implementations • NeurIPS 2021 • Gergely Neu, Julia Olkhovskaya
We consider an online learning problem where the learner interacts with a Markov decision process in a sequence of episodes, where the reward function is allowed to change between episodes in an adversarial manner and the learner only gets to observe the rewards associated with its actions.
no code implementations • 1 Feb 2020 • Gergely Neu, Julia Olkhovskaya
We consider an adversarial variant of the classic $K$-armed linear contextual bandit problem where the sequence of loss functions associated with each arm are allowed to change without restriction over time.
no code implementations • 28 Jan 2020 • Gergely Neu, Nikita Zhivotovskiy
In the setting of sequential prediction of individual $\{0, 1\}$-sequences with expert advice, we show that by allowing the learner to abstain from the prediction by paying a cost marginally smaller than $\frac 12$ (say, $0. 49$), it is possible to achieve expected regret bounds that are independent of the time horizon $T$.
no code implementations • L4DC 2020 • Joan Bas-Serrano, Gergely Neu
We consider the problem of computing optimal policies in average-reward Markov decision processes.
no code implementations • NeurIPS 2019 • Hugo Penedones, Carlos Riquelme, Damien Vincent, Hartmut Maennel, Timothy Mann, Andre Barreto, Sylvain Gelly, Gergely Neu
We consider the core reinforcement-learning problem of on-policy value function approximation from a batch of trajectory data, and focus on various issues of Temporal Difference (TD) learning and Monte Carlo (MC) policy evaluation.
no code implementations • NeurIPS 2019 • Nicole Mücke, Gergely Neu, Lorenzo Rosasco
While stochastic gradient descent (SGD) is one of the major workhorses in machine learning, the learning properties of many practically used variants are poorly understood.
no code implementations • 8 Feb 2019 • Wojciech Kotłowski, Gergely Neu
We consider a partial-feedback variant of the well-studied online PCA problem where a learner attempts to predict a sequence of $d$-dimensional vectors in terms of a quadratic loss, while only having limited feedback about the environment's choices.
no code implementations • 28 May 2018 • Julia Olkhovskaya, Gergely Neu, Gábor Lugosi
We consider an online influence maximization problem in which a decision maker selects a node among a large number of possibilities and places a piece of information at the node.
no code implementations • 22 Feb 2018 • Gergely Neu, Lorenzo Rosasco
We propose and analyze a variant of the classic Polyak-Ruppert averaging scheme, broadly used in stochastic gradient methods.
no code implementations • 16 Oct 2017 • Gábor Lugosi, Mihalis G. Markakis, Gergely Neu
Furthermore, we modify the proposed policy in order to perform well in terms of the tracking regret, that is, using as benchmark the best sequence of inventory decisions that switches a limited number of times.
no code implementations • NeurIPS 2017 • Nicolò Cesa-Bianchi, Claudio Gentile, Gábor Lugosi, Gergely Neu
Boltzmann exploration is a classic strategy for sequential decision-making under uncertainty, and is one of the most standard tools in Reinforcement Learning (RL).
no code implementations • 22 May 2017 • Gergely Neu, Anders Jonsson, Vicenç Gómez
We propose a general framework for entropy-regularized average-reward reinforcement learning in Markov decision processes (MDPs).
no code implementations • ICML 2017 • Tongliang Liu, Gábor Lugosi, Gergely Neu, DaCheng Tao
The bounds are based on martingale inequalities in the Banach space to which the hypotheses belong.
no code implementations • 21 Feb 2017 • Gergely Neu, Vicenç Gómez
We study the problem of online learning in a class of Markov decision processes known as linearly solvable MDPs.
no code implementations • NeurIPS 2015 • Gergely Neu
This work addresses the problem of regret minimization in non-stochastic multi-armed bandit problems, focusing on performance guarantees that hold with high probability.
no code implementations • 17 Mar 2015 • Gergely Neu, Gábor Bartók
We propose a sample-efficient alternative for importance weighting for situations where one only has sample access to the probability distribution that generates the observations.
no code implementations • 23 Feb 2015 • Gergely Neu
We consider the problem of online combinatorial optimization under semi-bandit feedback, where a learner has to repeatedly pick actions from a combinatorial decision set in order to minimize the total losses associated with its decisions.
no code implementations • NeurIPS 2014 • Amir Sani, Gergely Neu, Alessandro Lazaric
We consider the problem of online optimization, where a learner chooses a decision from a given decision set and suffers some loss associated with the decision and the state of the environment.
no code implementations • NeurIPS 2014 • Tomáš Kocák, Gergely Neu, Michal Valko, Remi Munos
As the predictions of our first algorithm cannot be always computed efficiently in this setting, we propose another algorithm with similar properties and with the benefit of always being computationally efficient, at the price of a slightly more complicated tuning mechanism.
no code implementations • NeurIPS 2014 • Gergely Neu, Michal Valko
Most work on sequential learning assumes a fixed set of actions that are available all the time.
no code implementations • 26 Jun 2014 • Yasin Abbasi-Yadkori, Gergely Neu
We study online learning of finite Markov decision process (MDP) problems when a side information vector is available.
no code implementations • NeurIPS 2013 • Alexander Zimin, Gergely Neu
We study the problem of online learning in finite episodic Markov decision processes where the loss function is allowed to change between episodes.
no code implementations • 13 May 2013 • Gergely Neu, Gábor Bartók
We consider the problem of online combinatorial optimization under semi-bandit feedback.
no code implementations • 20 Jun 2012 • Gergely Neu, Csaba Szepesvari
In this paper we propose a novel gradient algorithm to learn a policy from an expert's observed behavior assuming that the expert behaves optimally with respect to some unknown reward function of a Markovian Decision Problem.
no code implementations • NeurIPS 2010 • Gergely Neu, Andras Antos, András György, Csaba Szepesvári
We consider online learning in finite stochastic Markovian environments where in each time step a new reward function is chosen by an oblivious adversary.