Search Results for author: Pierre-Luc Bacon

Found 36 papers, 14 papers with code

Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons

no code implementations • 12 Mar 2024 • Simon Dufort-Labbé, Pierluca D'Oro, Evgenii Nikishin, Razvan Pascanu, Pierre-Luc Bacon, Aristide Baratin

When training deep neural networks, the phenomenon of $\textit{dying neurons}$ $\unicode{x2013}$units that become inactive or saturated, output zero during training$\unicode{x2013}$ has traditionally been viewed as undesirable, linked with optimization challenges, and contributing to plasticity loss in continual learning scenarios.

Continual Learning Model Compression

Paper
Add Code

Do Transformer World Models Give Better Policy Gradients?

no code implementations • 7 Feb 2024 • Michel Ma, Tianwei Ni, Clement Gehring, Pierluca D'Oro, Pierre-Luc Bacon

We integrate such AWMs into a policy gradient framework that underscores the relationship between network architectures and the policy gradient updates they inherently represent.

Navigate

Paper
Add Code

Bridging State and History Representations: Understanding Self-Predictive RL

1 code implementation • 17 Jan 2024 • Tianwei Ni, Benjamin Eysenbach, Erfan Seyedsalehi, Michel Ma, Clement Gehring, Aditya Mahajan, Pierre-Luc Bacon

These findings culminate in a set of preliminary guidelines for RL practitioners.

Reinforcement Learning (RL) Representation Learning

Paper
Code

Maximum entropy GFlowNets with soft Q-learning

no code implementations • 21 Dec 2023 • Sobhan Mohammadpour, Emmanuel Bengio, Emma Frejinger, Pierre-Luc Bacon

Generative Flow Networks (GFNs) have emerged as a powerful tool for sampling discrete objects from unnormalized distributions, offering a scalable alternative to Markov Chain Monte Carlo (MCMC) methods.

Q-Learning Reinforcement Learning (RL)

Paper
Add Code

Course Correcting Koopman Representations

no code implementations • 23 Oct 2023 • Mahan Fathi, Clement Gehring, Jonathan Pilault, David Kanaa, Pierre-Luc Bacon, Ross Goroshin

Koopman representations aim to learn features of nonlinear dynamical systems (NLDS) which lead to linear dynamics in the latent space.

Paper
Add Code

Motif: Intrinsic Motivation from Artificial Intelligence Feedback

1 code implementation • 29 Sep 2023 • Martin Klissarov, Pierluca D'Oro, Shagun Sodhani, Roberta Raileanu, Pierre-Luc Bacon, Pascal Vincent, Amy Zhang, Mikael Henaff

Exploring rich environments and evaluating one's actions without prior knowledge is immensely challenging.

Decision Making Language Modelling +3

112

Paper
Code

Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control

1 code implementation • NeurIPS 2023 • Nate Rahn, Pierluca D'Oro, Harley Wiltzer, Pierre-Luc Bacon, Marc G. Bellemare

To conclude, we develop a distribution-aware procedure which finds such paths, navigating away from noisy neighborhoods in order to improve the robustness of a policy.

Continuous Control

Paper
Code

When Do Transformers Shine in RL? Decoupling Memory from Credit Assignment

2 code implementations • NeurIPS 2023 • Tianwei Ni, Michel Ma, Benjamin Eysenbach, Pierre-Luc Bacon

The Transformer architecture has been very successful to solve problems that involve long-term dependencies, including in the RL domain.

Reinforcement Learning (RL)

275

Paper
Code

Goal-conditioned GFlowNets for Controllable Multi-Objective Molecular Design

no code implementations • 7 Jun 2023 • Julien Roy, Pierre-Luc Bacon, Christopher Pal, Emmanuel Bengio

In recent years, in-silico molecular design has received much attention from the machine learning community.

Paper
Add Code

Designing Biological Sequences via Meta-Reinforcement Learning and Bayesian Optimization

no code implementations • 13 Sep 2022 • Leo Feng, Padideh Nouri, Aneri Muni, Yoshua Bengio, Pierre-Luc Bacon

The problem can be framed as a global optimization problem where the objective is an expensive black-box function such that we can query large batches restricted with a limitation of a low number of rounds.

Bayesian Optimization Meta-Learning +3

Paper
Add Code

The Primacy Bias in Deep Reinforcement Learning

1 code implementation • 16 May 2022 • Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville

This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore useful evidence encountered later.

Ranked #8 on Atari Games 100k on Atari 100k

Atari Games 100k reinforcement-learning +1

Paper
Code

Continuous-Time Meta-Learning with Forward Mode Differentiation

no code implementations • ICLR 2022 • Tristan Deleu, David Kanaa, Leo Feng, Giancarlo Kerg, Yoshua Bengio, Guillaume Lajoie, Pierre-Luc Bacon

Drawing inspiration from gradient-based meta-learning methods with infinitely small gradient steps, we introduce Continuous-Time Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field.

Few-Shot Image Classification Meta-Learning

Paper
Add Code

Myriad: a real-world testbed to bridge trajectory optimization and deep learning

1 code implementation • 22 Feb 2022 • Nikolaus H. R. Howe, Simon Dufort-Labbé, Nitarshan Rajkumar, Pierre-Luc Bacon

We present Myriad, a testbed written in JAX for learning and planning in real-world continuous environments.

BIG-bench Machine Learning

Paper
Code

Direct Behavior Specification via Constrained Reinforcement Learning

1 code implementation • 22 Dec 2021 • Julien Roy, Roger Girgis, Joshua Romoff, Pierre-Luc Bacon, Christopher Pal

The standard formulation of Reinforcement Learning lacks a practical way of specifying what are admissible and forbidden behaviors.

Continuous Control reinforcement-learning +1

Paper
Code

Neural Algorithmic Reasoners are Implicit Planners

no code implementations • NeurIPS 2021 • Andreea Deac, Petar Veličković, Ognjen Milinković, Pierre-Luc Bacon, Jian Tang, Mladen Nikolić

We find that prior approaches either assume that the environment is provided in such a tabular form -- which is highly restrictive -- or infer "local neighbourhoods" of states to run value iteration over -- for which we discover an algorithmic bottleneck effect.

Self-Supervised Learning

Paper
Add Code

DistProp: A Scalable Approach to Lagrangian Training via Distributional Approximation

no code implementations • 29 Sep 2021 • Manuel Del Verme, Pierre-Luc Bacon

We develop a multiple shooting method for learning in deep neural networks based on the Lagrangian perspective on automatic differentiation.

Paper
Add Code

Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation

1 code implementation • ICML Workshop URL 2021 • Evgenii Nikishin, Romina Abachi, Rishabh Agarwal, Pierre-Luc Bacon

The shortcomings of maximum likelihood estimation in the context of model-based reinforcement learning have been highlighted by an increasing number of papers.

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Code

An Information-Theoretic Perspective on Credit Assignment in Reinforcement Learning

no code implementations • 10 Mar 2021 • Dilip Arumugam, Peter Henderson, Pierre-Luc Bacon

How do we formalize the challenge of credit assignment in reinforcement learning?

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

XLVIN: eXecuted Latent Value Iteration Nets

no code implementations • 25 Oct 2020 • Andreea Deac, Petar Veličković, Ognjen Milinković, Pierre-Luc Bacon, Jian Tang, Mladen Nikolić

Value Iteration Networks (VINs) have emerged as a popular method to incorporate planning algorithms within deep reinforcement learning, enabling performance improvements on tasks requiring long-range reasoning and understanding of environment dynamics.

Graph Representation Learning Self-Supervised Learning

Paper
Add Code

Graph neural induction of value iteration

no code implementations • 26 Sep 2020 • Andreea Deac, Pierre-Luc Bacon, Jian Tang

Previously, such planning components have been incorporated through a neural network that partially aligns with the computational graph of value iteration.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

TDprop: Does Jacobi Preconditioning Help Temporal Difference Learning?

no code implementations • 6 Jul 2020 • Joshua Romoff, Peter Henderson, David Kanaa, Emmanuel Bengio, Ahmed Touati, Pierre-Luc Bacon, Joelle Pineau

We investigate whether Jacobi preconditioning, accounting for the bootstrap term in temporal difference (TD) learning, can help boost performance of adaptive optimizers.

Paper
Add Code

Policy Evaluation Networks

no code implementations • 26 Feb 2020 • Jean Harb, Tom Schaul, Doina Precup, Pierre-Luc Bacon

The core idea of this paper is to flip this convention and estimate the value of many policies, for a single set of states.

Paper
Add Code

Options of Interest: Temporal Abstraction with Interest Functions

3 code implementations • 1 Jan 2020 • Khimya Khetarpal, Martin Klissarov, Maxime Chevalier-Boisvert, Pierre-Luc Bacon, Doina Precup

Temporal abstraction refers to the ability of an agent to use behaviours of controllers which act for a limited, variable amount of time.

671

Paper
Code

Entropy Regularization with Discounted Future State Distribution in Policy Gradient Methods

no code implementations • 11 Dec 2019 • Riashat Islam, Raihan Seraj, Pierre-Luc Bacon, Doina Precup

In this work, we propose exploration in policy gradient methods based on maximizing entropy of the discounted future state distribution.

Policy Gradient Methods

Paper
Add Code

All-Action Policy Gradient Methods: A Numerical Integration Approach

no code implementations • 21 Oct 2019 • Benjamin Petit, Loren Amdahl-Culleton, Yao Liu, Jimmy Smith, Pierre-Luc Bacon

While often stated as an instance of the likelihood ratio trick [Rubinstein, 1989], the original policy gradient theorem [Sutton, 1999] involves an integral over the action space.

Continuous Control Numerical Integration +1

Paper
Add Code

Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling

no code implementations • ICML 2020 • Yao Liu, Pierre-Luc Bacon, Emma Brunskill

Surprisingly, we find that in finite horizon MDPs there is no strict variance reduction of per-decision importance sampling or stationary importance sampling, comparing with vanilla importance sampling.

Off-policy evaluation

Paper
Add Code

The Barbados 2018 List of Open Issues in Continual Learning

no code implementations • 16 Nov 2018 • Tom Schaul, Hado van Hasselt, Joseph Modayil, Martha White, Adam White, Pierre-Luc Bacon, Jean Harb, Shibl Mourad, Marc Bellemare, Doina Precup

We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently act in complex environments.

Continual Learning

Paper
Add Code

Learning Robust Options

no code implementations • 9 Feb 2018 • Daniel J. Mankowitz, Timothy A. Mann, Pierre-Luc Bacon, Doina Precup, Shie Mannor

We present a Robust Options Policy Iteration (ROPI) algorithm with convergence guarantees, which learns options that are robust to model uncertainty.

Paper
Add Code

Learnings Options End-to-End for Continuous Action Tasks

3 code implementations • 30 Nov 2017 • Martin Klissarov, Pierre-Luc Bacon, Jean Harb, Doina Precup

We present new results on learning temporally extended actions for continuoustasks, using the options framework (Suttonet al.[1999b], Precup [2000]).

Paper
Code

Learning with Options that Terminate Off-Policy

no code implementations • 10 Nov 2017 • Anna Harutyunyan, Peter Vrancx, Pierre-Luc Bacon, Doina Precup, Ann Nowe

Generally, learning with longer options (like learning with multi-step returns) is known to be more efficient.

Paper
Add Code

OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning

1 code implementation • 20 Sep 2017 • Peter Henderson, Wei-Di Chang, Pierre-Luc Bacon, David Meger, Joelle Pineau, Doina Precup

Inverse reinforcement learning offers a useful paradigm to learn the underlying reward function directly from expert demonstrations.

Continuous Control Imitation Learning +3

Paper
Code

When Waiting is not an Option : Learning Options with a Deliberation Cost

1 code implementation • 14 Sep 2017 • Jean Harb, Pierre-Luc Bacon, Martin Klissarov, Doina Precup

Recent work has shown that temporally extended actions (options) can be learned fully end-to-end as opposed to being specified in advance.

Atari Games

Paper
Code

Convergent Tree Backup and Retrace with Function Approximation

no code implementations • ICML 2018 • Ahmed Touati, Pierre-Luc Bacon, Doina Precup, Pascal Vincent

Off-policy learning is key to scaling up reinforcement learning as it allows to learn about a target policy from the experience generated by a different behavior policy.

Paper
Add Code

A Matrix Splitting Perspective on Planning with Options

no code implementations • 3 Dec 2016 • Pierre-Luc Bacon, Doina Precup

We show that the Bellman operator underlying the options framework leads to a matrix splitting, an approach traditionally used to speed up convergence of iterative solvers for large linear systems of equations.

Paper
Add Code

The Option-Critic Architecture

9 code implementations • 16 Sep 2016 • Pierre-Luc Bacon, Jean Harb, Doina Precup

Temporal abstraction is key to scaling up learning and planning in reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

165

Paper
Code

Conditional Computation in Neural Networks for faster models

1 code implementation • 19 Nov 2015 • Emmanuel Bengio, Pierre-Luc Bacon, Joelle Pineau, Doina Precup

In this paper, we use reinforcement learning as a tool to optimize conditional computation policies.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.