Search Results for author: Nathan Lile

Found 1 papers, 0 papers with code

Suppressing Pink Elephants with Direct Principle Feedback

no code implementations12 Feb 2024 Louis Castricato, Nathan Lile, Suraj Anand, Hailey Schoelkopf, Siddharth Verma, Stella Biderman

Existing methods for controlling language models, such as RLHF and Constitutional AI, involve determining which LLM behaviors are desirable and training them into a language model.

Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.