Search Results for author: Will Dabney

Found 46 papers, 12 papers with code

Disentangling the Causes of Plasticity Loss in Neural Networks

no code implementations29 Feb 2024 Clare Lyle, Zeyu Zheng, Khimya Khetarpal, Hado van Hasselt, Razvan Pascanu, James Martens, Will Dabney

Underpinning the past decades of work on the design, initialization, and optimization of neural networks is a seemingly innocuous assumption: that the network is trained on a \textit{stationary} data distribution.

Atari Games reinforcement-learning

A Distributional Analogue to the Successor Representation

1 code implementation13 Feb 2024 Harley Wiltzer, Jesse Farebrother, Arthur Gretton, Yunhao Tang, André Barreto, Will Dabney, Marc G. Bellemare, Mark Rowland

This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process.

Distributional Reinforcement Learning Model-based Reinforcement Learning +1

Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model

no code implementations12 Feb 2024 Mark Rowland, Li Kevin Wenliang, Rémi Munos, Clare Lyle, Yunhao Tang, Will Dabney

We propose a new algorithm for model-based distributional reinforcement learning (RL), and prove that it is minimax-optimal for approximating return distributions with a generative model (up to logarithmic factors), resolving an open question of Zhang et al. (2023).

Distributional Reinforcement Learning reinforcement-learning +1

Off-policy Distributional Q($λ$): Distributional RL without Importance Sampling

no code implementations8 Feb 2024 Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney

We introduce off-policy distributional Q($\lambda$), a new addition to the family of off-policy distributional evaluation algorithms.

Bootstrapped Representations in Reinforcement Learning

no code implementations16 Jun 2023 Charline Le Lan, Stephen Tu, Mark Rowland, Anna Harutyunyan, Rishabh Agarwal, Marc G. Bellemare, Will Dabney

In this paper, we address this gap and provide a theoretical characterization of the state representation learnt by temporal difference learning (Sutton, 1988).

Auxiliary Learning reinforcement-learning +1

Understanding plasticity in neural networks

no code implementations2 Mar 2023 Clare Lyle, Zeyu Zheng, Evgenii Nikishin, Bernardo Avila Pires, Razvan Pascanu, Will Dabney

Plasticity, the ability of a neural network to quickly change its predictions in response to new information, is essential for the adaptability and robustness of deep reinforcement learning systems.

Atari Games

An Analysis of Quantile Temporal-Difference Learning

no code implementations11 Jan 2023 Mark Rowland, Rémi Munos, Mohammad Gheshlaghi Azar, Yunhao Tang, Georg Ostrovski, Anna Harutyunyan, Karl Tuyls, Marc G. Bellemare, Will Dabney

We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning.

Distributional Reinforcement Learning reinforcement-learning +1

Settling the Reward Hypothesis

no code implementations20 Dec 2022 Michael Bowling, John D. Martin, David Abel, Will Dabney

The reward hypothesis posits that, "all of what we mean by goals and purposes can be well thought of as maximization of the expected value of the cumulative sum of a received scalar signal (reward)."

Generalised Policy Improvement with Geometric Policy Composition

no code implementations17 Jun 2022 Shantanu Thakoor, Mark Rowland, Diana Borsa, Will Dabney, Rémi Munos, André Barreto

We introduce a method for policy improvement that interpolates between the greedy approach of value-based reinforcement learning (RL) and the full planning approach typical of model-based RL.

Continuous Control Reinforcement Learning (RL)

Learning Dynamics and Generalization in Reinforcement Learning

no code implementations5 Jun 2022 Clare Lyle, Mark Rowland, Will Dabney, Marta Kwiatkowska, Yarin Gal

Solving a reinforcement learning (RL) problem poses two competing challenges: fitting a potentially discontinuous value function, and generalizing well to new observations.

Policy Gradient Methods reinforcement-learning +1

Understanding and Preventing Capacity Loss in Reinforcement Learning

no code implementations ICLR 2022 Clare Lyle, Mark Rowland, Will Dabney

The reinforcement learning (RL) problem is rife with sources of non-stationarity, making it a notoriously difficult problem domain for the application of neural networks.

Montezuma's Revenge reinforcement-learning +1

On the Expressivity of Markov Reward

no code implementations NeurIPS 2021 David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman, Doina Precup, Satinder Singh

We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists.

The Difficulty of Passive Learning in Deep Reinforcement Learning

1 code implementation NeurIPS 2021 Georg Ostrovski, Pablo Samuel Castro, Will Dabney

Learning to act from observational data without active environmental interaction is a well-known challenge in Reinforcement Learning (RL).

reinforcement-learning Reinforcement Learning (RL)

Revisiting Peng's Q($λ$) for Modern Reinforcement Learning

no code implementations27 Feb 2021 Tadashi Kozuno, Yunhao Tang, Mark Rowland, Rémi Munos, Steven Kapturowski, Will Dabney, Michal Valko, David Abel

These results indicate that Peng's Q($\lambda$), which was thought to be unsafe, is a theoretically-sound and practically effective algorithm.

Continuous Control reinforcement-learning +1

On The Effect of Auxiliary Tasks on Representation Dynamics

no code implementations25 Feb 2021 Clare Lyle, Mark Rowland, Georg Ostrovski, Will Dabney

While auxiliary tasks play a key role in shaping the representations learnt by reinforcement learning agents, much is still unknown about the mechanisms through which this is achieved.

reinforcement-learning Reinforcement Learning (RL)

Revisiting Fundamentals of Experience Replay

2 code implementations ICML 2020 William Fedus, Prajit Ramachandran, Rishabh Agarwal, Yoshua Bengio, Hugo Larochelle, Mark Rowland, Will Dabney

Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but there remain significant gaps in our understanding.

DQN Replay Dataset Q-Learning +1

Temporally-Extended ε-Greedy Exploration

no code implementations ICLR 2021 Will Dabney, Georg Ostrovski, André Barreto

Recent work on exploration in reinforcement learning (RL) has led to a series of increasingly complex solutions to the problem.

Reinforcement Learning (RL)

Adapting Behaviour for Learning Progress

no code implementations14 Dec 2019 Tom Schaul, Diana Borsa, David Ding, David Szepesvari, Georg Ostrovski, Will Dabney, Simon Osindero

Determining what experience to generate to best facilitate learning (i. e. exploration) is one of the distinguishing features and open challenges in reinforcement learning.

Atari Games

Adaptive Trade-Offs in Off-Policy Learning

no code implementations16 Oct 2019 Mark Rowland, Will Dabney, Rémi Munos

A great variety of off-policy learning algorithms exist in the literature, and new breakthroughs in this area continue to be made, improving theoretical understanding and yielding state-of-the-art reinforcement learning algorithms.

Off-policy evaluation reinforcement-learning

Fast Task Inference with Variational Intrinsic Successor Features

no code implementations ICLR 2020 Steven Hansen, Will Dabney, Andre Barreto, Tom Van de Wiele, David Warde-Farley, Volodymyr Mnih

It has been established that diverse behaviors spanning the controllable subspace of an Markov decision process can be trained by rewarding a policy for being distinguishable from other policies \citep{gregor2016variational, eysenbach2018diversity, warde2018unsupervised}.

Recurrent Experience Replay in Distributed Reinforcement Learning

3 code implementations ICLR 2019 Steven Kapturowski, Georg Ostrovski, Will Dabney, John Quan, Remi Munos

Using a single network architecture and fixed set of hyperparameters, the resulting agent, Recurrent Replay Distributed DQN, quadruples the previous state of the art on Atari-57, and surpasses the state of the art on DMLab-30.

Atari Games reinforcement-learning +1

The Termination Critic

no code implementations26 Feb 2019 Anna Harutyunyan, Will Dabney, Diana Borsa, Nicolas Heess, Remi Munos, Doina Precup

In this work, we consider the problem of autonomously discovering behavioral abstractions, or options, for reinforcement learning agents.

Statistics and Samples in Distributional Reinforcement Learning

no code implementations21 Feb 2019 Mark Rowland, Robert Dadashi, Saurabh Kumar, Rémi Munos, Marc G. Bellemare, Will Dabney

We present a unifying framework for designing and analysing distributional reinforcement learning (DRL) algorithms in terms of recursively estimating statistics of the return distribution.

Distributional Reinforcement Learning reinforcement-learning +1

Autoregressive Quantile Networks for Generative Modeling

1 code implementation ICML 2018 Georg Ostrovski, Will Dabney, Rémi Munos

We introduce autoregressive implicit quantile networks (AIQN), a fundamentally different approach to generative modeling than those commonly used, that implicitly captures the distribution using quantile regression.

regression

Implicit Quantile Networks for Distributional Reinforcement Learning

20 code implementations ICML 2018 Will Dabney, Georg Ostrovski, David Silver, Rémi Munos

In this work, we build on recent advances in distributional reinforcement learning to give a generally applicable, flexible, and state-of-the-art distributional variant of DQN.

Atari Games Distributional Reinforcement Learning +3

Low-pass Recurrent Neural Networks - A memory architecture for longer-term correlation discovery

no code implementations13 May 2018 Thomas Stepleton, Razvan Pascanu, Will Dabney, Siddhant M. Jayakumar, Hubert Soyer, Remi Munos

Reinforcement learning (RL) agents performing complex tasks must be able to remember observations and actions across sizable time intervals.

Reinforcement Learning (RL)

An Analysis of Categorical Distributional Reinforcement Learning

no code implementations22 Feb 2018 Mark Rowland, Marc G. Bellemare, Will Dabney, Rémi Munos, Yee Whye Teh

Distributional approaches to value-based reinforcement learning model the entire distribution of returns, rather than just their expected values, and have recently been shown to yield state-of-the-art empirical performance.

Distributional Reinforcement Learning reinforcement-learning +1

Distributional Reinforcement Learning with Quantile Regression

17 code implementations27 Oct 2017 Will Dabney, Mark Rowland, Marc G. Bellemare, Rémi Munos

In this paper, we build on recent work advocating a distributional approach to reinforcement learning in which the distribution over returns is modeled explicitly instead of only estimating the mean.

Atari Games Distributional Reinforcement Learning +3

A Distributional Perspective on Reinforcement Learning

22 code implementations ICML 2017 Marc G. Bellemare, Will Dabney, Rémi Munos

We obtain both state-of-the-art results and anecdotal evidence demonstrating the importance of the value distribution in approximate reinforcement learning.

Atari Games reinforcement-learning +1

The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning

no code implementations ICLR 2018 Audrunas Gruslys, Will Dabney, Mohammad Gheshlaghi Azar, Bilal Piot, Marc Bellemare, Remi Munos

Our first contribution is a new policy evaluation algorithm called Distributional Retrace, which brings multi-step off-policy updates to the distributional reinforcement learning setting.

Atari Games Distributional Reinforcement Learning +1

Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces

no code implementations26 May 2014 Sridhar Mahadevan, Bo Liu, Philip Thomas, Will Dabney, Steve Giguere, Nicholas Jacek, Ian Gemp, Ji Liu

In this paper, we set forth a new vision of reinforcement learning developed by us over the past few years, one that yields mathematically rigorous solutions to longstanding important questions that have remained unresolved: (i) how to design reliable, convergent, and robust reinforcement learning algorithms (ii) how to guarantee that reinforcement learning satisfies pre-specified "safety" guarantees, and remains in a stable region of the parameter space (iii) how to design "off-policy" temporal difference learning algorithms in a reliable and stable manner, and finally (iv) how to integrate the study of reinforcement learning into the rich theory of stochastic optimization.

Decision Making reinforcement-learning +2

Cannot find the paper you are looking for? You can Submit a new open access paper.