no code implementations • 2 May 2024 • Maksym Korablyov, Cheng-Hao Liu, Moksh Jain, Almer M. van der Sloot, Eric Jolicoeur, Edward Ruediger, Andrei Cristian Nica, Emmanuel Bengio, Kostiantyn Lapchevskyi, Daniel St-Cyr, Doris Alexandra Schuetz, Victor Ion Butoi, Jarrid Rector-Brooks, Simon Blackburn, Leo Feng, Hadi Nekoei, SaiKrishna Gottipati, Priyesh Vijayan, Prateek Gupta, Ladislav Rampášek, Sasikanth Avancha, Pierre-Luc Bacon, William L. Hamilton, Brooks Paige, Sanchit Misra, Stanislaw Kamil Jastrzebski, Bharat Kaul, Doina Precup, José Miguel Hernández-Lobato, Marwin Segler, Michael Bronstein, Anne Marinier, Mike Tyers, Yoshua Bengio
Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exhibit a property of interest remains a significant challenge.
no code implementations • 12 Mar 2024 • Simon Dufort-Labbé, Pierluca D'Oro, Evgenii Nikishin, Razvan Pascanu, Pierre-Luc Bacon, Aristide Baratin
When training deep neural networks, the phenomenon of $\textit{dying neurons}$ $\unicode{x2013}$units that become inactive or saturated, output zero during training$\unicode{x2013}$ has traditionally been viewed as undesirable, linked with optimization challenges, and contributing to plasticity loss in continual learning scenarios.
no code implementations • 7 Feb 2024 • Michel Ma, Tianwei Ni, Clement Gehring, Pierluca D'Oro, Pierre-Luc Bacon
We integrate such AWMs into a policy gradient framework that underscores the relationship between network architectures and the policy gradient updates they inherently represent.
1 code implementation • 17 Jan 2024 • Tianwei Ni, Benjamin Eysenbach, Erfan Seyedsalehi, Michel Ma, Clement Gehring, Aditya Mahajan, Pierre-Luc Bacon
These findings culminate in a set of preliminary guidelines for RL practitioners.
no code implementations • 21 Dec 2023 • Sobhan Mohammadpour, Emmanuel Bengio, Emma Frejinger, Pierre-Luc Bacon
Generative Flow Networks (GFNs) have emerged as a powerful tool for sampling discrete objects from unnormalized distributions, offering a scalable alternative to Markov Chain Monte Carlo (MCMC) methods.
no code implementations • 23 Oct 2023 • Mahan Fathi, Clement Gehring, Jonathan Pilault, David Kanaa, Pierre-Luc Bacon, Ross Goroshin
Koopman representations aim to learn features of nonlinear dynamical systems (NLDS) which lead to linear dynamics in the latent space.
1 code implementation • 29 Sep 2023 • Martin Klissarov, Pierluca D'Oro, Shagun Sodhani, Roberta Raileanu, Pierre-Luc Bacon, Pascal Vincent, Amy Zhang, Mikael Henaff
Exploring rich environments and evaluating one's actions without prior knowledge is immensely challenging.
1 code implementation • NeurIPS 2023 • Nate Rahn, Pierluca D'Oro, Harley Wiltzer, Pierre-Luc Bacon, Marc G. Bellemare
To conclude, we develop a distribution-aware procedure which finds such paths, navigating away from noisy neighborhoods in order to improve the robustness of a policy.
2 code implementations • NeurIPS 2023 • Tianwei Ni, Michel Ma, Benjamin Eysenbach, Pierre-Luc Bacon
The Transformer architecture has been very successful to solve problems that involve long-term dependencies, including in the RL domain.
no code implementations • 7 Jun 2023 • Julien Roy, Pierre-Luc Bacon, Christopher Pal, Emmanuel Bengio
In recent years, in-silico molecular design has received much attention from the machine learning community.
no code implementations • 13 Sep 2022 • Leo Feng, Padideh Nouri, Aneri Muni, Yoshua Bengio, Pierre-Luc Bacon
The problem can be framed as a global optimization problem where the objective is an expensive black-box function such that we can query large batches restricted with a limitation of a low number of rounds.
1 code implementation • 16 May 2022 • Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville
This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore useful evidence encountered later.
Ranked #8 on Atari Games 100k on Atari 100k
no code implementations • ICLR 2022 • Tristan Deleu, David Kanaa, Leo Feng, Giancarlo Kerg, Yoshua Bengio, Guillaume Lajoie, Pierre-Luc Bacon
Drawing inspiration from gradient-based meta-learning methods with infinitely small gradient steps, we introduce Continuous-Time Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field.
1 code implementation • 22 Feb 2022 • Nikolaus H. R. Howe, Simon Dufort-Labbé, Nitarshan Rajkumar, Pierre-Luc Bacon
We present Myriad, a testbed written in JAX for learning and planning in real-world continuous environments.
1 code implementation • 22 Dec 2021 • Julien Roy, Roger Girgis, Joshua Romoff, Pierre-Luc Bacon, Christopher Pal
The standard formulation of Reinforcement Learning lacks a practical way of specifying what are admissible and forbidden behaviors.
no code implementations • NeurIPS 2021 • Andreea Deac, Petar Veličković, Ognjen Milinković, Pierre-Luc Bacon, Jian Tang, Mladen Nikolić
We find that prior approaches either assume that the environment is provided in such a tabular form -- which is highly restrictive -- or infer "local neighbourhoods" of states to run value iteration over -- for which we discover an algorithmic bottleneck effect.
no code implementations • 29 Sep 2021 • Manuel Del Verme, Pierre-Luc Bacon
We develop a multiple shooting method for learning in deep neural networks based on the Lagrangian perspective on automatic differentiation.
1 code implementation • ICML Workshop URL 2021 • Evgenii Nikishin, Romina Abachi, Rishabh Agarwal, Pierre-Luc Bacon
The shortcomings of maximum likelihood estimation in the context of model-based reinforcement learning have been highlighted by an increasing number of papers.
Model-based Reinforcement Learning reinforcement-learning +1
no code implementations • 10 Mar 2021 • Dilip Arumugam, Peter Henderson, Pierre-Luc Bacon
How do we formalize the challenge of credit assignment in reinforcement learning?
no code implementations • 25 Oct 2020 • Andreea Deac, Petar Veličković, Ognjen Milinković, Pierre-Luc Bacon, Jian Tang, Mladen Nikolić
Value Iteration Networks (VINs) have emerged as a popular method to incorporate planning algorithms within deep reinforcement learning, enabling performance improvements on tasks requiring long-range reasoning and understanding of environment dynamics.
no code implementations • 26 Sep 2020 • Andreea Deac, Pierre-Luc Bacon, Jian Tang
Previously, such planning components have been incorporated through a neural network that partially aligns with the computational graph of value iteration.
no code implementations • 6 Jul 2020 • Joshua Romoff, Peter Henderson, David Kanaa, Emmanuel Bengio, Ahmed Touati, Pierre-Luc Bacon, Joelle Pineau
We investigate whether Jacobi preconditioning, accounting for the bootstrap term in temporal difference (TD) learning, can help boost performance of adaptive optimizers.
no code implementations • 26 Feb 2020 • Jean Harb, Tom Schaul, Doina Precup, Pierre-Luc Bacon
The core idea of this paper is to flip this convention and estimate the value of many policies, for a single set of states.
3 code implementations • 1 Jan 2020 • Khimya Khetarpal, Martin Klissarov, Maxime Chevalier-Boisvert, Pierre-Luc Bacon, Doina Precup
Temporal abstraction refers to the ability of an agent to use behaviours of controllers which act for a limited, variable amount of time.
no code implementations • 11 Dec 2019 • Riashat Islam, Raihan Seraj, Pierre-Luc Bacon, Doina Precup
In this work, we propose exploration in policy gradient methods based on maximizing entropy of the discounted future state distribution.
no code implementations • 21 Oct 2019 • Benjamin Petit, Loren Amdahl-Culleton, Yao Liu, Jimmy Smith, Pierre-Luc Bacon
While often stated as an instance of the likelihood ratio trick [Rubinstein, 1989], the original policy gradient theorem [Sutton, 1999] involves an integral over the action space.
no code implementations • ICML 2020 • Yao Liu, Pierre-Luc Bacon, Emma Brunskill
Surprisingly, we find that in finite horizon MDPs there is no strict variance reduction of per-decision importance sampling or stationary importance sampling, comparing with vanilla importance sampling.
no code implementations • 16 Nov 2018 • Tom Schaul, Hado van Hasselt, Joseph Modayil, Martha White, Adam White, Pierre-Luc Bacon, Jean Harb, Shibl Mourad, Marc Bellemare, Doina Precup
We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently act in complex environments.
no code implementations • 9 Feb 2018 • Daniel J. Mankowitz, Timothy A. Mann, Pierre-Luc Bacon, Doina Precup, Shie Mannor
We present a Robust Options Policy Iteration (ROPI) algorithm with convergence guarantees, which learns options that are robust to model uncertainty.
3 code implementations • 30 Nov 2017 • Martin Klissarov, Pierre-Luc Bacon, Jean Harb, Doina Precup
We present new results on learning temporally extended actions for continuoustasks, using the options framework (Suttonet al.[1999b], Precup [2000]).
no code implementations • 10 Nov 2017 • Anna Harutyunyan, Peter Vrancx, Pierre-Luc Bacon, Doina Precup, Ann Nowe
Generally, learning with longer options (like learning with multi-step returns) is known to be more efficient.
1 code implementation • 20 Sep 2017 • Peter Henderson, Wei-Di Chang, Pierre-Luc Bacon, David Meger, Joelle Pineau, Doina Precup
Inverse reinforcement learning offers a useful paradigm to learn the underlying reward function directly from expert demonstrations.
1 code implementation • 14 Sep 2017 • Jean Harb, Pierre-Luc Bacon, Martin Klissarov, Doina Precup
Recent work has shown that temporally extended actions (options) can be learned fully end-to-end as opposed to being specified in advance.
no code implementations • ICML 2018 • Ahmed Touati, Pierre-Luc Bacon, Doina Precup, Pascal Vincent
Off-policy learning is key to scaling up reinforcement learning as it allows to learn about a target policy from the experience generated by a different behavior policy.
no code implementations • 3 Dec 2016 • Pierre-Luc Bacon, Doina Precup
We show that the Bellman operator underlying the options framework leads to a matrix splitting, an approach traditionally used to speed up convergence of iterative solvers for large linear systems of equations.
9 code implementations • 16 Sep 2016 • Pierre-Luc Bacon, Jean Harb, Doina Precup
Temporal abstraction is key to scaling up learning and planning in reinforcement learning.
1 code implementation • 19 Nov 2015 • Emmanuel Bengio, Pierre-Luc Bacon, Joelle Pineau, Doina Precup
In this paper, we use reinforcement learning as a tool to optimize conditional computation policies.