no code implementations • 4 Apr 2024 • Angus Nicolson, Lisa Schut, J. Alison Noble, Yarin Gal
We introduce tools designed to detect the presence of these properties, provide insight into how they affect the derived explanations, and provide recommendations to minimise their impact.
1 code implementation • 3 Apr 2024 • Stephen Casper, Jieun Yun, Joonhyuk Baek, Yeseong Jung, Minhwan Kim, Kiwan Kwon, Saerom Park, Hayden Moore, David Shriver, Marissa Connor, Keltin Grimes, Angus Nicolson, Arush Tagade, Jessica Rumbelow, Hieu Minh Nguyen, Dylan Hadfield-Menell
Interpretability techniques are valuable for helping humans understand and oversee AI systems.