Search Results for author: Gregory Serapio-Garcia

Found 1 papers, 1 papers with code

Moral Foundations of Large Language Models

1 code implementation23 Oct 2023 Marwa Abdulhai, Gregory Serapio-Garcia, Clément Crepy, Daria Valter, John Canny, Natasha Jaques

Finally, we show that we can adversarially select prompts that encourage the moral to exhibit a particular set of moral foundations, and that this can affect the model's behavior on downstream tasks.

Cannot find the paper you are looking for? You can Submit a new open access paper.