Search Results for author: Bernardo Ávila Pires

Found 7 papers, 0 papers with code

Generalized Preference Optimization: A Unified Approach to Offline Alignment

no code implementations • 8 Feb 2024 • Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot

Offline preference optimization allows fine-tuning large models directly from offline data, and has proved effective in recent alignment practices.

Paper
Add Code

Off-policy Distributional Q($λ$): Distributional RL without Importance Sampling

no code implementations • 8 Feb 2024 • Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney

We introduce off-policy distributional Q($\lambda$), a new addition to the family of off-policy distributional evaluation algorithms.

Paper
Add Code

DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm

no code implementations • 29 May 2023 • Yunhao Tang, Tadashi Kozuno, Mark Rowland, Anna Harutyunyan, Rémi Munos, Bernardo Ávila Pires, Michal Valko

Multi-step learning applies lookahead over multiple time steps and has proved valuable in policy evaluation settings.

Paper
Add Code

Understanding Self-Predictive Learning for Reinforcement Learning

no code implementations • 6 Dec 2022 • Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko

We identify that a faster paced optimization of the predictor and semi-gradient updates on the representation, are crucial to preventing the representation collapse.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning

no code implementations • 15 Jul 2022 • Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney, Marc G. Bellemare

We study the multi-step off-policy learning approach to distributional RL.

Distributional Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Multiclass Classification Calibration Functions

no code implementations • 20 Sep 2016 • Bernardo Ávila Pires, Csaba Szepesvári

We devise a streamlined analysis that simplifies the process of deriving calibration functions for a large number of surrogate losses that have been proposed in the literature.

Classification General Classification

Paper
Add Code

Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models

no code implementations • 19 Feb 2016 • Bernardo Ávila Pires, Csaba Szepesvári

In this paper we study a model-based approach to calculating approximately optimal policies in Markovian Decision Processes.

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.