Search Results for author: Fabien Roger

Found 5 papers, 3 papers with code

AI Control: Improving Safety Despite Intentional Subversion

no code implementations • 12 Dec 2023 • Ryan Greenblatt, Buck Shlegeris, Kshitij Sachan, Fabien Roger

This protocol asks GPT-4 to write code, and then asks another instance of GPT-4 whether the code is backdoored, using various techniques to prevent the GPT-4 instances from colluding.

Paper
Add Code

Preventing Language Models From Hiding Their Reasoning

1 code implementation • 27 Oct 2023 • Fabien Roger, Ryan Greenblatt

Large language models (LLMs) often benefit from intermediate steps of reasoning to generate answers to complex problems.

Paper
Code

Benchmarks for Detecting Measurement Tampering

1 code implementation • 29 Aug 2023 • Fabien Roger, Ryan Greenblatt, Max Nadeau, Buck Shlegeris, Nate Thomas

When training powerful AI systems to perform complex tasks, it may be challenging to provide training signals which are robust to optimization.

Paper
Code

Large Language Models Sometimes Generate Purely Negatively-Reinforced Text

1 code implementation • 13 Jun 2023 • Fabien Roger

When using adversarial training, it is common practice to train against the most egregious failures.

Paper
Code

Language models are better than humans at next-token prediction

no code implementations • 21 Dec 2022 • Buck Shlegeris, Fabien Roger, Lawrence Chan, Euan McLean

Current language models are considered to have sub-human capabilities at natural language tasks like question-answering or writing code.

Question Answering

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.