Search Results for author: Daniel M. Ziegler

Found 7 papers, 4 papers with code

Adversarial Training for High-Stakes Reliability

no code implementations3 May 2022 Daniel M. Ziegler, Seraphina Nix, Lawrence Chan, Tim Bauman, Peter Schmidt-Nielsen, Tao Lin, Adam Scherlis, Noa Nabeshima, Ben Weinstein-Raun, Daniel de Haas, Buck Shlegeris, Nate Thomas

We found that adversarial training increased robustness to the adversarial attacks that we trained on -- doubling the time for our contractors to find adversarial examples both with our tool (from 13 to 26 minutes) and without (from 20 to 44 minutes) -- without affecting in-distribution performance.

Text Generation Vocal Bursts Intensity Prediction

Recursively Summarizing Books with Human Feedback

no code implementations22 Sep 2021 Jeff Wu, Long Ouyang, Daniel M. Ziegler, Nisan Stiennon, Ryan Lowe, Jan Leike, Paul Christiano

Our human labelers are able to supervise and evaluate the models quickly, despite not having read the entire books themselves.

Abstractive Text Summarization Question Answering

Learning to summarize from human feedback

1 code implementation2 Sep 2020 Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano

We collect a large, high-quality dataset of human comparisons between summaries, train a model to predict the human-preferred summary, and use that model as a reward function to fine-tune a summarization policy using reinforcement learning.

Fine-Tuning Language Models from Human Preferences

6 code implementations18 Sep 2019 Daniel M. Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B. Brown, Alec Radford, Dario Amodei, Paul Christiano, Geoffrey Irving

Most work on reward learning has used simulated environments, but complex information about values is often expressed in natural language, and we believe reward learning for language is a key to making RL practical and safe for real-world tasks.

Descriptive Language Modelling +1

Cannot find the paper you are looking for? You can Submit a new open access paper.