Search Results for author: Avital Balwit

Found 3 papers, 0 papers with code

Specific versus General Principles for Constitutional AI

no code implementations • 20 Oct 2023 • Sandipan Kundu, Yuntao Bai, Saurav Kadavath, Amanda Askell, Andrew Callahan, Anna Chen, Anna Goldie, Avital Balwit, Azalia Mirhoseini, Brayden McLean, Catherine Olsson, Cassie Evraets, Eli Tran-Johnson, Esin Durmus, Ethan Perez, Jackson Kernion, Jamie Kerr, Kamal Ndousse, Karina Nguyen, Nelson Elhage, Newton Cheng, Nicholas Schiefer, Nova DasSarma, Oliver Rausch, Robin Larson, Shannon Yang, Shauna Kravec, Timothy Telleen-Lawton, Thomas I. Liao, Tom Henighan, Tristan Hume, Zac Hatfield-Dodds, Sören Mindermann, Nicholas Joseph, Sam McCandlish, Jared Kaplan

Constitutional AI offers an alternative, replacing human feedback with feedback from AI models conditioned only on a list of written principles.

Paper
Add Code

Aligned with Whom? Direct and social goals for AI systems

no code implementations • 9 May 2022 • Anton Korinek, Avital Balwit

As artificial intelligence (AI) becomes more powerful and widespread, the AI alignment problem - how to ensure that AI systems pursue the goals that we want them to pursue - has garnered growing attention.

Paper
Add Code

Truthful AI: Developing and governing AI that does not lie

no code implementations • 13 Oct 2021 • Owain Evans, Owen Cotton-Barratt, Lukas Finnveden, Adam Bales, Avital Balwit, Peter Wills, Luca Righetti, William Saunders

Establishing norms or laws of AI truthfulness will require significant work to: (1) identify clear truthfulness standards; (2) create institutions that can judge adherence to those standards; and (3) develop AI systems that are robustly truthful.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.