no code implementations • 12 Mar 2024 • Simon Dufort-Labbé, Pierluca D'Oro, Evgenii Nikishin, Razvan Pascanu, Pierre-Luc Bacon, Aristide Baratin
When training deep neural networks, the phenomenon of $\textit{dying neurons}$ $\unicode{x2013}$units that become inactive or saturated, output zero during training$\unicode{x2013}$ has traditionally been viewed as undesirable, linked with optimization challenges, and contributing to plasticity loss in continual learning scenarios.
no code implementations • 7 Feb 2024 • Michel Ma, Tianwei Ni, Clement Gehring, Pierluca D'Oro, Pierre-Luc Bacon
We integrate such AWMs into a policy gradient framework that underscores the relationship between network architectures and the policy gradient updates they inherently represent.
1 code implementation • 17 Jan 2024 • Tianwei Ni, Benjamin Eysenbach, Erfan Seyedsalehi, Michel Ma, Clement Gehring, Aditya Mahajan, Pierre-Luc Bacon
These findings culminate in a set of preliminary guidelines for RL practitioners.
no code implementations • 21 Dec 2023 • Sobhan Mohammadpour, Emmanuel Bengio, Emma Frejinger, Pierre-Luc Bacon
Generative Flow Networks (GFNs) have emerged as a powerful tool for sampling discrete objects from unnormalized distributions, offering a scalable alternative to Markov Chain Monte Carlo (MCMC) methods.
no code implementations • 23 Oct 2023 • Mahan Fathi, Clement Gehring, Jonathan Pilault, David Kanaa, Pierre-Luc Bacon, Ross Goroshin
Koopman representations aim to learn features of nonlinear dynamical systems (NLDS) which lead to linear dynamics in the latent space.
1 code implementation • 29 Sep 2023 • Martin Klissarov, Pierluca D'Oro, Shagun Sodhani, Roberta Raileanu, Pierre-Luc Bacon, Pascal Vincent, Amy Zhang, Mikael Henaff
Exploring rich environments and evaluating one's actions without prior knowledge is immensely challenging.
1 code implementation • NeurIPS 2023 • Nate Rahn, Pierluca D'Oro, Harley Wiltzer, Pierre-Luc Bacon, Marc G. Bellemare
To conclude, we develop a distribution-aware procedure which finds such paths, navigating away from noisy neighborhoods in order to improve the robustness of a policy.
2 code implementations • NeurIPS 2023 • Tianwei Ni, Michel Ma, Benjamin Eysenbach, Pierre-Luc Bacon
The Transformer architecture has been very successful to solve problems that involve long-term dependencies, including in the RL domain.
no code implementations • 7 Jun 2023 • Julien Roy, Pierre-Luc Bacon, Christopher Pal, Emmanuel Bengio
In recent years, in-silico molecular design has received much attention from the machine learning community.
no code implementations • 13 Sep 2022 • Leo Feng, Padideh Nouri, Aneri Muni, Yoshua Bengio, Pierre-Luc Bacon
The problem can be framed as a global optimization problem where the objective is an expensive black-box function such that we can query large batches restricted with a limitation of a low number of rounds.
1 code implementation • 16 May 2022 • Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville
This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore useful evidence encountered later.
Ranked #8 on Atari Games 100k on Atari 100k
no code implementations • ICLR 2022 • Tristan Deleu, David Kanaa, Leo Feng, Giancarlo Kerg, Yoshua Bengio, Guillaume Lajoie, Pierre-Luc Bacon
Drawing inspiration from gradient-based meta-learning methods with infinitely small gradient steps, we introduce Continuous-Time Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field.
1 code implementation • 22 Feb 2022 • Nikolaus H. R. Howe, Simon Dufort-Labbé, Nitarshan Rajkumar, Pierre-Luc Bacon
We present Myriad, a testbed written in JAX for learning and planning in real-world continuous environments.
1 code implementation • 22 Dec 2021 • Julien Roy, Roger Girgis, Joshua Romoff, Pierre-Luc Bacon, Christopher Pal
The standard formulation of Reinforcement Learning lacks a practical way of specifying what are admissible and forbidden behaviors.
no code implementations • NeurIPS 2021 • Andreea Deac, Petar Veličković, Ognjen Milinković, Pierre-Luc Bacon, Jian Tang, Mladen Nikolić
We find that prior approaches either assume that the environment is provided in such a tabular form -- which is highly restrictive -- or infer "local neighbourhoods" of states to run value iteration over -- for which we discover an algorithmic bottleneck effect.
no code implementations • 29 Sep 2021 • Manuel Del Verme, Pierre-Luc Bacon
We develop a multiple shooting method for learning in deep neural networks based on the Lagrangian perspective on automatic differentiation.
1 code implementation • ICML Workshop URL 2021 • Evgenii Nikishin, Romina Abachi, Rishabh Agarwal, Pierre-Luc Bacon
The shortcomings of maximum likelihood estimation in the context of model-based reinforcement learning have been highlighted by an increasing number of papers.
Model-based Reinforcement Learning reinforcement-learning +1
no code implementations • 10 Mar 2021 • Dilip Arumugam, Peter Henderson, Pierre-Luc Bacon
How do we formalize the challenge of credit assignment in reinforcement learning?
no code implementations • 25 Oct 2020 • Andreea Deac, Petar Veličković, Ognjen Milinković, Pierre-Luc Bacon, Jian Tang, Mladen Nikolić
Value Iteration Networks (VINs) have emerged as a popular method to incorporate planning algorithms within deep reinforcement learning, enabling performance improvements on tasks requiring long-range reasoning and understanding of environment dynamics.
no code implementations • 26 Sep 2020 • Andreea Deac, Pierre-Luc Bacon, Jian Tang
Previously, such planning components have been incorporated through a neural network that partially aligns with the computational graph of value iteration.
no code implementations • 6 Jul 2020 • Joshua Romoff, Peter Henderson, David Kanaa, Emmanuel Bengio, Ahmed Touati, Pierre-Luc Bacon, Joelle Pineau
We investigate whether Jacobi preconditioning, accounting for the bootstrap term in temporal difference (TD) learning, can help boost performance of adaptive optimizers.
no code implementations • 26 Feb 2020 • Jean Harb, Tom Schaul, Doina Precup, Pierre-Luc Bacon
The core idea of this paper is to flip this convention and estimate the value of many policies, for a single set of states.
3 code implementations • 1 Jan 2020 • Khimya Khetarpal, Martin Klissarov, Maxime Chevalier-Boisvert, Pierre-Luc Bacon, Doina Precup
Temporal abstraction refers to the ability of an agent to use behaviours of controllers which act for a limited, variable amount of time.
no code implementations • 11 Dec 2019 • Riashat Islam, Raihan Seraj, Pierre-Luc Bacon, Doina Precup
In this work, we propose exploration in policy gradient methods based on maximizing entropy of the discounted future state distribution.
no code implementations • 21 Oct 2019 • Benjamin Petit, Loren Amdahl-Culleton, Yao Liu, Jimmy Smith, Pierre-Luc Bacon
While often stated as an instance of the likelihood ratio trick [Rubinstein, 1989], the original policy gradient theorem [Sutton, 1999] involves an integral over the action space.
no code implementations • ICML 2020 • Yao Liu, Pierre-Luc Bacon, Emma Brunskill
Surprisingly, we find that in finite horizon MDPs there is no strict variance reduction of per-decision importance sampling or stationary importance sampling, comparing with vanilla importance sampling.
no code implementations • 16 Nov 2018 • Tom Schaul, Hado van Hasselt, Joseph Modayil, Martha White, Adam White, Pierre-Luc Bacon, Jean Harb, Shibl Mourad, Marc Bellemare, Doina Precup
We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently act in complex environments.
no code implementations • 9 Feb 2018 • Daniel J. Mankowitz, Timothy A. Mann, Pierre-Luc Bacon, Doina Precup, Shie Mannor
We present a Robust Options Policy Iteration (ROPI) algorithm with convergence guarantees, which learns options that are robust to model uncertainty.
3 code implementations • 30 Nov 2017 • Martin Klissarov, Pierre-Luc Bacon, Jean Harb, Doina Precup
We present new results on learning temporally extended actions for continuoustasks, using the options framework (Suttonet al.[1999b], Precup [2000]).
no code implementations • 10 Nov 2017 • Anna Harutyunyan, Peter Vrancx, Pierre-Luc Bacon, Doina Precup, Ann Nowe
Generally, learning with longer options (like learning with multi-step returns) is known to be more efficient.
1 code implementation • 20 Sep 2017 • Peter Henderson, Wei-Di Chang, Pierre-Luc Bacon, David Meger, Joelle Pineau, Doina Precup
Inverse reinforcement learning offers a useful paradigm to learn the underlying reward function directly from expert demonstrations.
1 code implementation • 14 Sep 2017 • Jean Harb, Pierre-Luc Bacon, Martin Klissarov, Doina Precup
Recent work has shown that temporally extended actions (options) can be learned fully end-to-end as opposed to being specified in advance.
no code implementations • ICML 2018 • Ahmed Touati, Pierre-Luc Bacon, Doina Precup, Pascal Vincent
Off-policy learning is key to scaling up reinforcement learning as it allows to learn about a target policy from the experience generated by a different behavior policy.
no code implementations • 3 Dec 2016 • Pierre-Luc Bacon, Doina Precup
We show that the Bellman operator underlying the options framework leads to a matrix splitting, an approach traditionally used to speed up convergence of iterative solvers for large linear systems of equations.
9 code implementations • 16 Sep 2016 • Pierre-Luc Bacon, Jean Harb, Doina Precup
Temporal abstraction is key to scaling up learning and planning in reinforcement learning.
1 code implementation • 19 Nov 2015 • Emmanuel Bengio, Pierre-Luc Bacon, Joelle Pineau, Doina Precup
In this paper, we use reinforcement learning as a tool to optimize conditional computation policies.