Search Results for author: Thomas Coste

Found 3 papers, 1 papers with code

Bayesian Reward Models for LLM Alignment

no code implementations20 Feb 2024 Adam X. Yang, Maxime Robeyns, Thomas Coste, Jun Wang, Haitham Bou-Ammar, Laurence Aitchison

To ensure that large language model (LLM) responses are helpful and non-toxic, we usually fine-tune a reward model on human preference data.

Language Modelling Large Language Model

Reward Model Ensembles Help Mitigate Overoptimization

1 code implementation4 Oct 2023 Thomas Coste, Usman Anwar, Robert Kirk, David Krueger

Gao et al. (2023) studied this phenomenon in a synthetic human feedback setup with a significantly larger "gold" reward model acting as the true reward (instead of humans) and showed that overoptimization remains a persistent problem regardless of the size of the proxy reward model and training data used.

Model Optimization

Cannot find the paper you are looking for? You can Submit a new open access paper.