no code implementations • 12 Dec 2023 • Ryan Greenblatt, Buck Shlegeris, Kshitij Sachan, Fabien Roger
This protocol asks GPT-4 to write code, and then asks another instance of GPT-4 whether the code is backdoored, using various techniques to prevent the GPT-4 instances from colluding.
1 code implementation • 27 Oct 2023 • Fabien Roger, Ryan Greenblatt
Large language models (LLMs) often benefit from intermediate steps of reasoning to generate answers to complex problems.
1 code implementation • 29 Aug 2023 • Fabien Roger, Ryan Greenblatt, Max Nadeau, Buck Shlegeris, Nate Thomas
When training powerful AI systems to perform complex tasks, it may be challenging to provide training signals which are robust to optimization.
1 code implementation • 13 Jun 2023 • Fabien Roger
When using adversarial training, it is common practice to train against the most egregious failures.
no code implementations • 21 Dec 2022 • Buck Shlegeris, Fabien Roger, Lawrence Chan, Euan McLean
Current language models are considered to have sub-human capabilities at natural language tasks like question-answering or writing code.