1 code implementation • 20 Nov 2023 • David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, Samuel R. Bowman
We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry.
1 code implementation • 15 Nov 2023 • Julian Michael, Salsabila Mahdi, David Rein, Jackson Petty, Julien Dirani, Vishakh Padmakumar, Samuel R. Bowman
Comparing debate to a baseline we call consultancy, where a single expert argues for only one answer which is correct half of the time, we find that debate performs significantly better, with 84% judge accuracy compared to consultancy's 74%.
1 code implementation • 18 Dec 2020 • Anilesh K. Krishnaswamy, Haoming Li, David Rein, Hanrui Zhang, Vincent Conitzer
To this end, we present {\sc IC-LR}, a modification of Logistic Regression that removes the incentive to strategically drop features.