no code implementations • 28 Sep 2023 • Stuart Armstrong, Alexandre Maranhão, Oliver Daniels-Koch, Patrick Leask, Rebecca Gorman
Goal misgeneralisation is a key challenge in AI alignment -- the task of getting powerful Artificial Intelligences to align their goals with human intentions and human morality.
no code implementations • 30 Aug 2023 • Matija Franklin, Philip Moreira Tomei, Rebecca Gorman
The European Union's Artificial Intelligence Act aims to regulate manipulative and harmful uses of AI, but lacks precise definitions for key concepts.
no code implementations • 19 Jun 2023 • Matija Franklin, Rebecca Gorman, Hal Ashton, Stuart Armstrong
This article is a primer on concept extrapolation - the ability to take a concept, a feature, or a goal that is defined in one context and extrapolate it safely to a more general context.
no code implementations • 20 Mar 2022 • Matija Franklin, Hal Ashton, Rebecca Gorman, Stuart Armstrong
We operationalize preference to incorporate concepts from various disciplines, outlining the importance of meta-preferences and preference-change preferences, and proposing a preliminary framework for how preferences change.
no code implementations • 28 Feb 2022 • Rebecca Gorman, Stuart Armstrong
For an artificial intelligence (AI) to be aligned with human values (or human preferences), it must first learn those values.