no code implementations • 8 Feb 2024 • Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot
Offline preference optimization allows fine-tuning large models directly from offline data, and has proved effective in recent alignment practices.
no code implementations • 8 Feb 2024 • Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney
We introduce off-policy distributional Q($\lambda$), a new addition to the family of off-policy distributional evaluation algorithms.
no code implementations • 29 May 2023 • Yunhao Tang, Tadashi Kozuno, Mark Rowland, Anna Harutyunyan, Rémi Munos, Bernardo Ávila Pires, Michal Valko
Multi-step learning applies lookahead over multiple time steps and has proved valuable in policy evaluation settings.
no code implementations • 6 Dec 2022 • Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko
We identify that a faster paced optimization of the predictor and semi-gradient updates on the representation, are crucial to preventing the representation collapse.
no code implementations • 15 Jul 2022 • Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney, Marc G. Bellemare
We study the multi-step off-policy learning approach to distributional RL.
Distributional Reinforcement Learning reinforcement-learning +1
no code implementations • 20 Sep 2016 • Bernardo Ávila Pires, Csaba Szepesvári
We devise a streamlined analysis that simplifies the process of deriving calibration functions for a large number of surrogate losses that have been proposed in the literature.
no code implementations • 19 Feb 2016 • Bernardo Ávila Pires, Csaba Szepesvári
In this paper we study a model-based approach to calculating approximately optimal policies in Markovian Decision Processes.
Model-based Reinforcement Learning reinforcement-learning +1