no code implementations • 1 Jan 2021 • Thomas Pierrot, Valentin Macé, Jean-Baptiste Sevestre, Louis Monier, Alexandre Laterre, Nicolas Perrin, Karim Beguir, Olivier Sigaud
Very large action spaces constitute a critical challenge for deep Reinforcement Learning (RL) algorithms.
no code implementations • 1 Jan 2021 • Thomas Pierrot, Valentin Macé, Geoffrey Cideron, Nicolas Perrin, Karim Beguir, Olivier Sigaud
The QD part contributes structural biases by decoupling the search for diversity from the search for high return, resulting in efficient management of the exploration-exploitation trade-off.
no code implementations • 27 Jul 2020 • Thomas Pierrot, Nicolas Perrin, Feryal Behbahani, Alexandre Laterre, Olivier Sigaud, Karim Beguir, Nando de Freitas
Third, the self-models are harnessed to learn recursive compositional programs with multiple levels of abstraction.
no code implementations • 12 Jun 2020 • Nicolas Grislain, Nicolas Perrin, Antoine Thabault
Bidding in real-time auctions can be a difficult stochastic control task; especially if underdelivery incurs strong penalties and the market is very uncertain.
no code implementations • 24 Apr 2020 • Guillaume Matheron, Nicolas Perrin, Olivier Sigaud
In this paper, we propose a new algorithm called "Plan, Backplay, Chain Skills" (PBCS) that combines motion planning and reinforcement learning to solve hard exploration environments.
no code implementations • 26 Nov 2019 • Guillaume Matheron, Nicolas Perrin, Olivier Sigaud
In environments with continuous state and action spaces, state-of-the-art actor-critic reinforcement learning algorithms can solve very complex problems, yet can also fail in environments that seem trivial, but the reason for such failures is still poorly understood.
1 code implementation • NeurIPS 2019 • Thomas Pierrot, Guillaume Ligner, Scott Reed, Olivier Sigaud, Nicolas Perrin, Alexandre Laterre, David Kas, Karim Beguir, Nando de Freitas
AlphaZero contributes powerful neural network guided search algorithms, which we augment with recursion.
no code implementations • 18 Oct 2018 • Thomas Pierrot, Nicolas Perrin, Olivier Sigaud
In this paper, we provide an overview of first-order and second-order variants of the gradient descent method that are commonly used in machine learning.
no code implementations • 17 Aug 2018 • Aloïs Pourchot, Nicolas Perrin, Olivier Sigaud
Then, from an empirical comparison based on a simple benchmark, we show that, though it actually provides better sample efficiency, it is still far from the sample efficiency of deep reinforcement learning, though it is more stable.