Search Results for author: Tomasz Korbak

Found 25 papers, 16 papers with code

Aligning language models with human preferences

1 code implementation18 Apr 2024 Tomasz Korbak

In Chapter 3, I investigate the relation between two approaches to finetuning pretrained LMs using feedback given by a scoring function: reinforcement learning from human feedback (RLHF) and distribution matching.

Bayesian Inference

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

no code implementations1 Apr 2024 Matthias Gerstgrasser, Rylan Schaeffer, Apratim Dey, Rafael Rafailov, Henry Sleight, John Hughes, Tomasz Korbak, Rajashree Agrawal, Dhruv Pai, Andrey Gromov, Daniel A. Roberts, Diyi Yang, David L. Donoho, Sanmi Koyejo

The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs?

Image Generation

Towards Understanding Sycophancy in Language Models

1 code implementation20 Oct 2023 Mrinank Sharma, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R. Bowman, Newton Cheng, Esin Durmus, Zac Hatfield-Dodds, Scott R. Johnston, Shauna Kravec, Timothy Maxwell, Sam McCandlish, Kamal Ndousse, Oliver Rausch, Nicholas Schiefer, Da Yan, Miranda Zhang, Ethan Perez

Overall, our results indicate that sycophancy is a general behavior of state-of-the-art AI assistants, likely driven in part by human preference judgments favoring sycophantic responses.

Text Generation

Compositional preference models for aligning LMs

1 code implementation17 Oct 2023 Dongyoung Go, Tomasz Korbak, Germán Kruszewski, Jos Rozen, Marc Dymetman

As language models (LMs) become more capable, it is increasingly important to align them with human preferences.

The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"

2 code implementations21 Sep 2023 Lukas Berglund, Meg Tong, Max Kaufmann, Mikita Balesni, Asa Cooper Stickland, Tomasz Korbak, Owain Evans

If a model is trained on a sentence of the form "A is B", it will not automatically generalize to the reverse direction "B is A".

Data Augmentation Sentence

Improving Code Generation by Training with Natural Language Feedback

1 code implementation28 Mar 2023 Angelica Chen, Jérémy Scheurer, Tomasz Korbak, Jon Ander Campos, Jun Shern Chan, Samuel R. Bowman, Kyunghyun Cho, Ethan Perez

The potential for pre-trained large language models (LLMs) to use natural language feedback at inference time has been an exciting recent development.

Code Generation Imitation Learning +1

Aligning Language Models with Preferences through f-divergence Minimization

1 code implementation16 Feb 2023 Dongyoung Go, Tomasz Korbak, Germán Kruszewski, Jos Rozen, Nahyeon Ryu, Marc Dymetman

We show that Jensen-Shannon divergence strikes a good balance between these objectives, and frequently outperforms forward KL divergence by a wide margin, leading to significant improvements over prior work.

Pretraining Language Models with Human Preferences

1 code implementation16 Feb 2023 Tomasz Korbak, Kejian Shi, Angelica Chen, Rasika Bhalerao, Christopher L. Buckley, Jason Phang, Samuel R. Bowman, Ethan Perez

Language models (LMs) are pretrained to imitate internet text, including content that would violate human preferences if generated by an LM: falsehoods, offensive comments, personally identifiable information, low-quality or buggy code, and more.

Imitation Learning Language Modelling

On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting

2 code implementations1 Jun 2022 Tomasz Korbak, Hady Elsahar, Germán Kruszewski, Marc Dymetman

Here we explore the theoretical connections between the two paradigms, and show that methods such as KL-control developed for RM can also be construed as belonging to DM.

Language Modelling Reinforcement Learning (RL) +1

RL with KL penalties is better viewed as Bayesian inference

no code implementations23 May 2022 Tomasz Korbak, Ethan Perez, Christopher L Buckley

We show that KL-regularised RL is equivalent to variational inference: approximating a Bayesian posterior which specifies how to update a prior LM to conform with evidence provided by the reward function.

Bayesian Inference Language Modelling +2

A continuity of Markov blanket interpretations under the Free Energy Principle

no code implementations18 Jan 2022 Anil Seth, Tomasz Korbak, Alexander Tschantz

Bruineberg and colleagues helpfully distinguish between instrumental and ontological interpretations of Markov blankets, exposing the dangers of using the former to make claims about the latter.

Controlling Conditional Language Models without Catastrophic Forgetting

1 code implementation1 Dec 2021 Tomasz Korbak, Hady Elsahar, German Kruszewski, Marc Dymetman

Machine learning is shifting towards general-purpose pretrained generative models, trained in a self-supervised manner on large amounts of data, which can then be applied to solve a large number of tasks.

Abstractive Text Summarization Code Generation

On Reward Maximization and Distribution Matching for Fine-Tuning Language Models

no code implementations29 Sep 2021 Tomasz Korbak, Hady Elsahar, Germán Kruszewski, Marc Dymetman

The availability of large pre-trained models is changing the landscape of Machine Learning research and practice, moving from a "training from scratch" to a "fine-tuning'' paradigm.

Language Modelling Reinforcement Learning (RL) +1

Energy-Based Models for Code Generation under Compilability Constraints

1 code implementation9 Jun 2021 Tomasz Korbak, Hady Elsahar, Marc Dymetman, Germán Kruszewski

Neural language models can be successfully trained on source code, leading to applications such as code completion.

Code Completion Code Generation

Measuring non-trivial compositionality in emergent communication

1 code implementation28 Oct 2020 Tomasz Korbak, Julian Zubek, Joanna Rączaszek-Leonardi

Compositionality is an important explanatory target in emergent communication and language evolution.

Fine-tuning Tree-LSTM for phrase-level sentiment classification on a Polish dependency treebank. Submission to PolEval task 2

1 code implementation3 Nov 2017 Tomasz Korbak, Paulina Żak

We describe a variant of Child-Sum Tree-LSTM deep neural network (Tai et al, 2015) fine-tuned for working with dependency trees and morphologically rich languages using the example of Polish.

General Classification Sentiment Analysis +3

Cannot find the paper you are looking for? You can Submit a new open access paper.