Search Results for author: Fabien Roger

Found 5 papers, 3 papers with code

AI Control: Improving Safety Despite Intentional Subversion

no code implementations12 Dec 2023 Ryan Greenblatt, Buck Shlegeris, Kshitij Sachan, Fabien Roger

This protocol asks GPT-4 to write code, and then asks another instance of GPT-4 whether the code is backdoored, using various techniques to prevent the GPT-4 instances from colluding.

Preventing Language Models From Hiding Their Reasoning

1 code implementation27 Oct 2023 Fabien Roger, Ryan Greenblatt

Large language models (LLMs) often benefit from intermediate steps of reasoning to generate answers to complex problems.

Benchmarks for Detecting Measurement Tampering

1 code implementation29 Aug 2023 Fabien Roger, Ryan Greenblatt, Max Nadeau, Buck Shlegeris, Nate Thomas

When training powerful AI systems to perform complex tasks, it may be challenging to provide training signals which are robust to optimization.

Large Language Models Sometimes Generate Purely Negatively-Reinforced Text

1 code implementation13 Jun 2023 Fabien Roger

When using adversarial training, it is common practice to train against the most egregious failures.

Language models are better than humans at next-token prediction

no code implementations21 Dec 2022 Buck Shlegeris, Fabien Roger, Lawrence Chan, Euan McLean

Current language models are considered to have sub-human capabilities at natural language tasks like question-answering or writing code.

Question Answering

Cannot find the paper you are looking for? You can Submit a new open access paper.