no code implementations • 23 Feb 2022 • Chris van Merwijk, Ryan Carey, Tom Everitt
Influence diagrams have recently been used to analyse the safety and fairness properties of AI systems.
no code implementations • 5 Jun 2019 • Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, Scott Garrabrant
We analyze the type of learned optimization that occurs when a learned model (such as a neural network) is itself an optimizer - a situation we refer to as mesa-optimization, a neologism we introduce in this paper.