no code implementations • 28 Jul 2023 • Thomas McGrath, Matthew Rahtz, Janos Kramar, Vladimir Mikulik, Shane Legg
We investigate the internal structure of language model computations using causal analysis and demonstrate two motifs: (1) a form of adaptive computation where ablations of one attention layer of a language model cause another layer to compensate (which we term the Hydra effect) and (2) a counterbalancing function of late MLP layers that act to downregulate the maximum-likelihood token.
no code implementations • 13 Apr 2023 • Victoria Krakovna, Janos Kramar
Power-seeking behavior is a key source of risk from advanced AI, but our theoretical understanding of this phenomenon is relatively limited.
no code implementations • 25 Sep 2019 • Thomas Anthony, Ian Gemp, Janos Kramar, Tom Eccles, Andrea Tacchetti, Yoram Bachrach
In contrast to auctions designed manually by economists, our method searches the possible design space using a simulation of the multi-agent learning process, and can thus handle settings where a game-theoretic equilibrium analysis is not tractable.
no code implementations • 24 Jul 2017 • James Babcock, Janos Kramar, Roman V. Yampolskiy
With almost daily improvements in capabilities of artificial intelligence it is more important than ever to develop safety software for use by the AI research community.
no code implementations • 2 Apr 2016 • James Babcock, Janos Kramar, Roman Yampolskiy
There is considerable uncertainty about what properties, capabilities and motivations future AGIs will have.