Search Results for author: Bruno Gaujal

Found 6 papers, 0 papers with code

Learning Optimal Admission Control in Partially Observable Queueing Networks

no code implementations • 4 Aug 2023 • Jonatha Anselmi, Bruno Gaujal, Louis-Sébastien Rebuffi

While reinforcement learning in Partially Observable Markov Decision Processes (POMDP) is prohibitively expensive in general, we show that our algorithm has a regret that only depends sub-linearly on the maximal number of jobs in the network, $S$.

reinforcement-learning

Paper
Add Code

Reinforcement Learning in a Birth and Death Process: Breaking the Dependence on the State Space

no code implementations • 21 Feb 2023 • Jonatha Anselmi, Bruno Gaujal, Louis-Sébastien Rebuffi

In our main result however, we exploit the structure of our MDPs to show that the regret of a slightly-tweaked version of the classical learning algorithm {\sc Ucrl2} is in fact upper bounded by $\tilde{\mathcal{O}}(\sqrt{E_2AT})$ where $E_2$ is related to the weighted second moment of the stationary measure of a reference policy.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Decentralized model-free reinforcement learning in stochastic games with average-reward objective

no code implementations • 13 Jan 2023 • Romain Cravic, Nicolas Gast, Bruno Gaujal

We propose the first model-free algorithm that achieves low regret performance for decentralized learning in two-player zero-sum tabular stochastic games with infinite-horizon average-reward objective.

Q-Learning reinforcement-learning +1

Paper
Add Code

Reinforcement Learning for Markovian Bandits: Is Posterior Sampling more Scalable than Optimism?

no code implementations • 16 Jun 2021 • Nicolas Gast, Bruno Gaujal, Kimang Khun

While the regret bound and runtime of vanilla implementations of PSRL and UCRL2 are exponential in the number of bandits, we show that the episodic regret of MB-PSRL and MB-UCRL2 is $\tilde{O}(S\sqrt{nK})$ where $K$ is the number of episodes, $n$ is the number of bandits and $S$ is the number of states of each bandit (the exact bound in S, n and K is given in the paper).

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Exponential Convergence Rate for the Asymptotic Optimality of Whittle Index Policy

no code implementations • 16 Dec 2020 • Nicolas Gast, Bruno Gaujal, Chen Yan

In this paper we show that, under the same conditions, the convergence rate is exponential in the number of bandits, unless the fixed point is singular (to be defined later).

Performance Optimization and Control Probability

Paper
Add Code

Penalty-regulated dynamics and robust learning procedures in games

no code implementations • 9 Mar 2013 • Pierre Coucheney, Bruno Gaujal, Panayotis Mertikopoulos

Starting from a heuristic learning scheme for N-person games, we derive a new class of continuous-time learning dynamics consisting of a replicator-like drift adjusted by a penalty term that renders the boundary of the game's strategy space repelling.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.