Search Results for author: Alexa Y. Pan

Found 1 papers, 1 papers with code

How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions

1 code implementation • 26 Sep 2023 • Lorenzo Pacchiardi, Alex J. Chan, Sören Mindermann, Ilan Moscovitz, Alexa Y. Pan, Yarin Gal, Owain Evans, Jan Brauner

Large language models (LLMs) can "lie", which we define as outputting false statements despite "knowing" the truth in a demonstrable sense.

51

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.