Search Results for author: Pierre-Luc Bacon

Found 36 papers, 14 papers with code

Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons

no code implementations12 Mar 2024 Simon Dufort-Labbé, Pierluca D'Oro, Evgenii Nikishin, Razvan Pascanu, Pierre-Luc Bacon, Aristide Baratin

When training deep neural networks, the phenomenon of $\textit{dying neurons}$ $\unicode{x2013}$units that become inactive or saturated, output zero during training$\unicode{x2013}$ has traditionally been viewed as undesirable, linked with optimization challenges, and contributing to plasticity loss in continual learning scenarios.

Continual Learning Model Compression

Do Transformer World Models Give Better Policy Gradients?

no code implementations7 Feb 2024 Michel Ma, Tianwei Ni, Clement Gehring, Pierluca D'Oro, Pierre-Luc Bacon

We integrate such AWMs into a policy gradient framework that underscores the relationship between network architectures and the policy gradient updates they inherently represent.

Navigate

Maximum entropy GFlowNets with soft Q-learning

no code implementations21 Dec 2023 Sobhan Mohammadpour, Emmanuel Bengio, Emma Frejinger, Pierre-Luc Bacon

Generative Flow Networks (GFNs) have emerged as a powerful tool for sampling discrete objects from unnormalized distributions, offering a scalable alternative to Markov Chain Monte Carlo (MCMC) methods.

Q-Learning Reinforcement Learning (RL)

Course Correcting Koopman Representations

no code implementations23 Oct 2023 Mahan Fathi, Clement Gehring, Jonathan Pilault, David Kanaa, Pierre-Luc Bacon, Ross Goroshin

Koopman representations aim to learn features of nonlinear dynamical systems (NLDS) which lead to linear dynamics in the latent space.

Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control

1 code implementation NeurIPS 2023 Nate Rahn, Pierluca D'Oro, Harley Wiltzer, Pierre-Luc Bacon, Marc G. Bellemare

To conclude, we develop a distribution-aware procedure which finds such paths, navigating away from noisy neighborhoods in order to improve the robustness of a policy.

Continuous Control

When Do Transformers Shine in RL? Decoupling Memory from Credit Assignment

2 code implementations NeurIPS 2023 Tianwei Ni, Michel Ma, Benjamin Eysenbach, Pierre-Luc Bacon

The Transformer architecture has been very successful to solve problems that involve long-term dependencies, including in the RL domain.

Reinforcement Learning (RL)

Goal-conditioned GFlowNets for Controllable Multi-Objective Molecular Design

no code implementations7 Jun 2023 Julien Roy, Pierre-Luc Bacon, Christopher Pal, Emmanuel Bengio

In recent years, in-silico molecular design has received much attention from the machine learning community.

Designing Biological Sequences via Meta-Reinforcement Learning and Bayesian Optimization

no code implementations13 Sep 2022 Leo Feng, Padideh Nouri, Aneri Muni, Yoshua Bengio, Pierre-Luc Bacon

The problem can be framed as a global optimization problem where the objective is an expensive black-box function such that we can query large batches restricted with a limitation of a low number of rounds.

Bayesian Optimization Meta-Learning +3

The Primacy Bias in Deep Reinforcement Learning

1 code implementation16 May 2022 Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville

This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore useful evidence encountered later.

Atari Games 100k reinforcement-learning +1

Continuous-Time Meta-Learning with Forward Mode Differentiation

no code implementations ICLR 2022 Tristan Deleu, David Kanaa, Leo Feng, Giancarlo Kerg, Yoshua Bengio, Guillaume Lajoie, Pierre-Luc Bacon

Drawing inspiration from gradient-based meta-learning methods with infinitely small gradient steps, we introduce Continuous-Time Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field.

Few-Shot Image Classification Meta-Learning

Direct Behavior Specification via Constrained Reinforcement Learning

1 code implementation22 Dec 2021 Julien Roy, Roger Girgis, Joshua Romoff, Pierre-Luc Bacon, Christopher Pal

The standard formulation of Reinforcement Learning lacks a practical way of specifying what are admissible and forbidden behaviors.

Continuous Control reinforcement-learning +1

Neural Algorithmic Reasoners are Implicit Planners

no code implementations NeurIPS 2021 Andreea Deac, Petar Veličković, Ognjen Milinković, Pierre-Luc Bacon, Jian Tang, Mladen Nikolić

We find that prior approaches either assume that the environment is provided in such a tabular form -- which is highly restrictive -- or infer "local neighbourhoods" of states to run value iteration over -- for which we discover an algorithmic bottleneck effect.

Self-Supervised Learning

DistProp: A Scalable Approach to Lagrangian Training via Distributional Approximation

no code implementations29 Sep 2021 Manuel Del Verme, Pierre-Luc Bacon

We develop a multiple shooting method for learning in deep neural networks based on the Lagrangian perspective on automatic differentiation.

XLVIN: eXecuted Latent Value Iteration Nets

no code implementations25 Oct 2020 Andreea Deac, Petar Veličković, Ognjen Milinković, Pierre-Luc Bacon, Jian Tang, Mladen Nikolić

Value Iteration Networks (VINs) have emerged as a popular method to incorporate planning algorithms within deep reinforcement learning, enabling performance improvements on tasks requiring long-range reasoning and understanding of environment dynamics.

Graph Representation Learning Self-Supervised Learning

Graph neural induction of value iteration

no code implementations26 Sep 2020 Andreea Deac, Pierre-Luc Bacon, Jian Tang

Previously, such planning components have been incorporated through a neural network that partially aligns with the computational graph of value iteration.

reinforcement-learning Reinforcement Learning (RL)

TDprop: Does Jacobi Preconditioning Help Temporal Difference Learning?

no code implementations6 Jul 2020 Joshua Romoff, Peter Henderson, David Kanaa, Emmanuel Bengio, Ahmed Touati, Pierre-Luc Bacon, Joelle Pineau

We investigate whether Jacobi preconditioning, accounting for the bootstrap term in temporal difference (TD) learning, can help boost performance of adaptive optimizers.

Policy Evaluation Networks

no code implementations26 Feb 2020 Jean Harb, Tom Schaul, Doina Precup, Pierre-Luc Bacon

The core idea of this paper is to flip this convention and estimate the value of many policies, for a single set of states.

Options of Interest: Temporal Abstraction with Interest Functions

3 code implementations1 Jan 2020 Khimya Khetarpal, Martin Klissarov, Maxime Chevalier-Boisvert, Pierre-Luc Bacon, Doina Precup

Temporal abstraction refers to the ability of an agent to use behaviours of controllers which act for a limited, variable amount of time.

Entropy Regularization with Discounted Future State Distribution in Policy Gradient Methods

no code implementations11 Dec 2019 Riashat Islam, Raihan Seraj, Pierre-Luc Bacon, Doina Precup

In this work, we propose exploration in policy gradient methods based on maximizing entropy of the discounted future state distribution.

Policy Gradient Methods

All-Action Policy Gradient Methods: A Numerical Integration Approach

no code implementations21 Oct 2019 Benjamin Petit, Loren Amdahl-Culleton, Yao Liu, Jimmy Smith, Pierre-Luc Bacon

While often stated as an instance of the likelihood ratio trick [Rubinstein, 1989], the original policy gradient theorem [Sutton, 1999] involves an integral over the action space.

Continuous Control Numerical Integration +1

Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling

no code implementations ICML 2020 Yao Liu, Pierre-Luc Bacon, Emma Brunskill

Surprisingly, we find that in finite horizon MDPs there is no strict variance reduction of per-decision importance sampling or stationary importance sampling, comparing with vanilla importance sampling.

Off-policy evaluation

The Barbados 2018 List of Open Issues in Continual Learning

no code implementations16 Nov 2018 Tom Schaul, Hado van Hasselt, Joseph Modayil, Martha White, Adam White, Pierre-Luc Bacon, Jean Harb, Shibl Mourad, Marc Bellemare, Doina Precup

We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently act in complex environments.

Continual Learning

Learning Robust Options

no code implementations9 Feb 2018 Daniel J. Mankowitz, Timothy A. Mann, Pierre-Luc Bacon, Doina Precup, Shie Mannor

We present a Robust Options Policy Iteration (ROPI) algorithm with convergence guarantees, which learns options that are robust to model uncertainty.

Learnings Options End-to-End for Continuous Action Tasks

3 code implementations30 Nov 2017 Martin Klissarov, Pierre-Luc Bacon, Jean Harb, Doina Precup

We present new results on learning temporally extended actions for continuoustasks, using the options framework (Suttonet al.[1999b], Precup [2000]).

Learning with Options that Terminate Off-Policy

no code implementations10 Nov 2017 Anna Harutyunyan, Peter Vrancx, Pierre-Luc Bacon, Doina Precup, Ann Nowe

Generally, learning with longer options (like learning with multi-step returns) is known to be more efficient.

When Waiting is not an Option : Learning Options with a Deliberation Cost

1 code implementation14 Sep 2017 Jean Harb, Pierre-Luc Bacon, Martin Klissarov, Doina Precup

Recent work has shown that temporally extended actions (options) can be learned fully end-to-end as opposed to being specified in advance.

Atari Games

Convergent Tree Backup and Retrace with Function Approximation

no code implementations ICML 2018 Ahmed Touati, Pierre-Luc Bacon, Doina Precup, Pascal Vincent

Off-policy learning is key to scaling up reinforcement learning as it allows to learn about a target policy from the experience generated by a different behavior policy.

A Matrix Splitting Perspective on Planning with Options

no code implementations3 Dec 2016 Pierre-Luc Bacon, Doina Precup

We show that the Bellman operator underlying the options framework leads to a matrix splitting, an approach traditionally used to speed up convergence of iterative solvers for large linear systems of equations.

The Option-Critic Architecture

9 code implementations16 Sep 2016 Pierre-Luc Bacon, Jean Harb, Doina Precup

Temporal abstraction is key to scaling up learning and planning in reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.