no code implementations • 1 Mar 2024 • János Kramár, Tom Lieberum, Rohin Shah, Neel Nanda
We investigate Attribution Patching (AtP), a fast gradient-based approximation to Activation Patching and find two classes of failure modes of AtP which lead to significant false negatives.
no code implementations • 5 Sep 2023 • Vikrant Varma, Rohin Shah, Zachary Kenton, János Kramár, Ramana Kumar
One of the most surprising puzzles in neural network generalisation is grokking: a network with perfect training accuracy but poor generalisation will, upon further training, transition to perfect generalisation.
no code implementations • 18 Jul 2023 • Tom Lieberum, Matthew Rahtz, János Kramár, Neel Nanda, Geoffrey Irving, Rohin Shah, Vladimir Mikulik
\emph{Circuit analysis} is a promising technique for understanding the internal mechanisms of language models.
1 code implementation • NeurIPS 2023 • David Lindner, János Kramár, Sebastian Farquhar, Matthew Rahtz, Thomas McGrath, Vladimir Mikulik
Additionally, the known structure of Tracr-compiled models can serve as ground-truth for evaluating interpretability methods.
2 code implementations • NeurIPS 2020 • Thomas Anthony, Tom Eccles, Andrea Tacchetti, János Kramár, Ian Gemp, Thomas C. Hudson, Nicolas Porcel, Marc Lanctot, Julien Pérolat, Richard Everett, Roman Werpachowski, Satinder Singh, Thore Graepel, Yoram Bachrach
It also features a large combinatorial action space and simultaneous moves, which are challenging for RL algorithms.
15 code implementations • 26 Aug 2019 • Marc Lanctot, Edward Lockhart, Jean-Baptiste Lespiau, Vinicius Zambaldi, Satyaki Upadhyay, Julien Pérolat, Sriram Srinivasan, Finbarr Timbers, Karl Tuyls, Shayegan Omidshafiei, Daniel Hennes, Dustin Morrill, Paul Muller, Timo Ewalds, Ryan Faulkner, János Kramár, Bart De Vylder, Brennan Saeta, James Bradbury, David Ding, Sebastian Borgeaud, Matthew Lai, Julian Schrittwieser, Thomas Anthony, Edward Hughes, Ivo Danihelka, Jonah Ryan-Davis
OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.
no code implementations • 19 Mar 2019 • Tom Eccles, Edward Hughes, János Kramár, Steven Wheelwright, Joel Z. Leibo
We analyse the resulting policies to show that the reciprocating agents are strongly influenced by their co-players' behavior.
1 code implementation • ICLR 2018 • Yuke Zhu, Ziyu Wang, Josh Merel, Andrei Rusu, Tom Erez, Serkan Cabi, Saran Tunyasuvunakool, János Kramár, Raia Hadsell, Nando de Freitas, Nicolas Heess
We propose a model-free deep reinforcement learning method that leverages a small amount of demonstration data to assist a reinforcement learning agent.
6 code implementations • 3 Jun 2016 • David Krueger, Tegan Maharaj, János Kramár, Mohammad Pezeshki, Nicolas Ballas, Nan Rosemary Ke, Anirudh Goyal, Yoshua Bengio, Aaron Courville, Chris Pal
We propose zoneout, a novel method for regularizing RNNs.