1 code implementation • 21 Feb 2024 • Paul Daoudi, Bogdan Robu, Christophe Prieur, Ludovic Dos Santos, Merwan Barlier
This paper addresses the problem of integrating local guide policies into a Reinforcement Learning agent.
no code implementations • 21 Feb 2024 • Paul Daoudi, Bojan Mavkov, Bogdan Robu, Christophe Prieur, Emmanuel Witrant, Merwan Barlier, Ludovic Dos Santos
This paper presents a learning-based control strategy for non-linear throttle valves with an asymmetric hysteresis, leading to a near-optimal controller without requiring any prior knowledge about the environment.
no code implementations • 8 Feb 2024 • Alexandre Rio, Merwan Barlier, Igor Colin, Albert Thomas
We address offline reinforcement learning with privacy guarantees, where the goal is to train a policy that is differentially private with respect to individual trajectories in the dataset.
no code implementations • 24 Dec 2023 • Paul Daoudi, Christophe Prieur, Bogdan Robu, Merwan Barlier, Ludovic Dos Santos
In the few-shot framework, a limited number of transitions from the target environment are introduced to facilitate a more effective transfer.
no code implementations • 15 Sep 2023 • Hamza Cherkaoui, Merwan Barlier, Igor Colin
We address in this paper a particular instance of the multi-agent linear stochastic bandit problem, called clustered multi-agent linear bandits.
no code implementations • 15 Sep 2023 • Xuedong Shang, Igor Colin, Merwan Barlier, Hamza Cherkaoui
We introduce the safe best-arm identification framework with linear feedback, where the agent is subject to some stage-wise safety constraint that linearly depends on an unknown parameter vector.
no code implementations • 29 Sep 2021 • Paul Daoudi, Merwan Barlier, Ludovic Dos Santos, Aladin Virmaux
We hence introduce Density Conservative Q-Learning (D-CQL), a batch-RL algorithm with strong theoretical guarantees that carefully penalizes the value function based on the amount of information collected in the state-action space.
no code implementations • NeurIPS 2020 • Kevin Scaman, Ludovic Dos Santos, Merwan Barlier, Igor Colin
This novel smoothing method is then used to improve first-order non-smooth optimization (both convex and non-convex) by allowing for a local exploration of the search space.