1 code implementation • 5 Feb 2024 • Vinitra Swamy, Julian Blackwell, Jibril Frej, Martin Jaggi, Tanja Käser
Real-world interpretability for neural networks is a tradeoff between three concerns: 1) it requires humans to trust the explanation approximation (e. g. post-hoc approaches), 2) it compromises the understandability of the explanation (e. g. automatically identified feature masks), and 3) it compromises the model performance (e. g. decision trees).