no code implementations • 11 Jan 2023 • Mark Rowland, Rémi Munos, Mohammad Gheshlaghi Azar, Yunhao Tang, Georg Ostrovski, Anna Harutyunyan, Karl Tuyls, Marc G. Bellemare, Will Dabney
We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning.
Distributional Reinforcement Learning reinforcement-learning +1
no code implementations • 5 Jul 2022 • Caglar Gulcehre, Srivatsan Srinivasan, Jakub Sygnowski, Georg Ostrovski, Mehrdad Farajtabar, Matt Hoffman, Razvan Pascanu, Arnaud Doucet
Also, we empirically identify three phases of learning that explain the impact of implicit regularization on the learning dynamics and found that bootstrapping alone is insufficient to explain the collapse of the effective rank.
no code implementations • 1 Jun 2022 • Tom Schaul, André Barreto, John Quan, Georg Ostrovski
We identify and study the phenomenon of policy churn, that is, the rapid change of the greedy policy in value-based reinforcement learning.
1 code implementation • NeurIPS 2021 • Georg Ostrovski, Pablo Samuel Castro, Will Dabney
Learning to act from observational data without active environmental interaction is a well-known challenge in Reinforcement Learning (RL).
no code implementations • NeurIPS 2021 • Miruna Pîslar, David Szepesvari, Georg Ostrovski, Diana Borsa, Tom Schaul
Exploration remains a central challenge for reinforcement learning (RL).
no code implementations • 11 May 2021 • Tom Schaul, Georg Ostrovski, Iurii Kemaev, Diana Borsa
Scaling issues are mundane yet irritating for practitioners of reinforcement learning.
no code implementations • 25 Feb 2021 • Clare Lyle, Mark Rowland, Georg Ostrovski, Will Dabney
While auxiliary tasks play a key role in shaping the representations learnt by reinforcement learning agents, much is still unknown about the mechanisms through which this is achieved.
no code implementations • ICLR 2021 • Will Dabney, Georg Ostrovski, André Barreto
Recent work on exploration in reinforcement learning (RL) has led to a series of increasingly complex solutions to the problem.
no code implementations • 14 Dec 2019 • Tom Schaul, Diana Borsa, David Ding, David Szepesvari, Georg Ostrovski, Will Dabney, Simon Osindero
Determining what experience to generate to best facilitate learning (i. e. exploration) is one of the distinguishing features and open challenges in reinforcement learning.
3 code implementations • ICLR 2019 • Steven Kapturowski, Georg Ostrovski, Will Dabney, John Quan, Remi Munos
Using a single network architecture and fixed set of hyperparameters, the resulting agent, Recurrent Replay Distributed DQN, quadruples the previous state of the art on Atari-57, and surpasses the state of the art on DMLab-30.
Ranked #1 on Atari Games on Atari 2600 Pong
1 code implementation • ICML 2018 • Georg Ostrovski, Will Dabney, Rémi Munos
We introduce autoregressive implicit quantile networks (AIQN), a fundamentally different approach to generative modeling than those commonly used, that implicitly captures the distribution using quantile regression.
20 code implementations • ICML 2018 • Will Dabney, Georg Ostrovski, David Silver, Rémi Munos
In this work, we build on recent advances in distributional reinforcement learning to give a generally applicable, flexible, and state-of-the-art distributional variant of DQN.
Ranked #1 on Atari Games on Atari 2600 Freeway
32 code implementations • 6 Oct 2017 • Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver
The deep reinforcement learning community has made several independent improvements to the DQN algorithm.
1 code implementation • ICML 2017 • Georg Ostrovski, Marc G. Bellemare, Aaron van den Oord, Remi Munos
This pseudo-count was used to generate an exploration bonus for a DQN agent and combined with a mixed Monte Carlo update was sufficient to achieve state of the art on the Atari 2600 game Montezuma's Revenge.
Ranked #9 on Atari Games on Atari 2600 Montezuma's Revenge
1 code implementation • NeurIPS 2016 • Marc G. Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, Remi Munos
We consider an agent's uncertainty about its environment and the problem of generalizing this uncertainty across observations.
Ranked #10 on Atari Games on Atari 2600 Montezuma's Revenge
2 code implementations • 15 Dec 2015 • Marc G. Bellemare, Georg Ostrovski, Arthur Guez, Philip S. Thomas, Rémi Munos
Extending the idea of a locally consistent operator, we then derive sufficient conditions for an operator to preserve optimality, leading to a family of operators which includes our consistent Bellman operator.
Ranked #1 on Atari Games on Atari 2600 Elevator Action
7 code implementations • 25 Feb 2015 • Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg1 & Demis Hassabis
We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters.