Search Results for author: Bruno Gaujal

Found 6 papers, 0 papers with code

Learning Optimal Admission Control in Partially Observable Queueing Networks

no code implementations4 Aug 2023 Jonatha Anselmi, Bruno Gaujal, Louis-Sébastien Rebuffi

While reinforcement learning in Partially Observable Markov Decision Processes (POMDP) is prohibitively expensive in general, we show that our algorithm has a regret that only depends sub-linearly on the maximal number of jobs in the network, $S$.

reinforcement-learning

Reinforcement Learning in a Birth and Death Process: Breaking the Dependence on the State Space

no code implementations21 Feb 2023 Jonatha Anselmi, Bruno Gaujal, Louis-Sébastien Rebuffi

In our main result however, we exploit the structure of our MDPs to show that the regret of a slightly-tweaked version of the classical learning algorithm {\sc Ucrl2} is in fact upper bounded by $\tilde{\mathcal{O}}(\sqrt{E_2AT})$ where $E_2$ is related to the weighted second moment of the stationary measure of a reference policy.

reinforcement-learning Reinforcement Learning (RL)

Decentralized model-free reinforcement learning in stochastic games with average-reward objective

no code implementations13 Jan 2023 Romain Cravic, Nicolas Gast, Bruno Gaujal

We propose the first model-free algorithm that achieves low regret performance for decentralized learning in two-player zero-sum tabular stochastic games with infinite-horizon average-reward objective.

Q-Learning reinforcement-learning +1

Reinforcement Learning for Markovian Bandits: Is Posterior Sampling more Scalable than Optimism?

no code implementations16 Jun 2021 Nicolas Gast, Bruno Gaujal, Kimang Khun

While the regret bound and runtime of vanilla implementations of PSRL and UCRL2 are exponential in the number of bandits, we show that the episodic regret of MB-PSRL and MB-UCRL2 is $\tilde{O}(S\sqrt{nK})$ where $K$ is the number of episodes, $n$ is the number of bandits and $S$ is the number of states of each bandit (the exact bound in S, n and K is given in the paper).

reinforcement-learning Reinforcement Learning (RL)

Exponential Convergence Rate for the Asymptotic Optimality of Whittle Index Policy

no code implementations16 Dec 2020 Nicolas Gast, Bruno Gaujal, Chen Yan

In this paper we show that, under the same conditions, the convergence rate is exponential in the number of bandits, unless the fixed point is singular (to be defined later).

Performance Optimization and Control Probability

Penalty-regulated dynamics and robust learning procedures in games

no code implementations9 Mar 2013 Pierre Coucheney, Bruno Gaujal, Panayotis Mertikopoulos

Starting from a heuristic learning scheme for N-person games, we derive a new class of continuous-time learning dynamics consisting of a replicator-like drift adjusted by a penalty term that renders the boundary of the game's strategy space repelling.

Cannot find the paper you are looking for? You can Submit a new open access paper.