Search Results for author: Jonathan Tu

Found 3 papers, 0 papers with code

Understanding the Inner Workings of Language Models Through Representation Dissimilarity

no code implementations • 23 Oct 2023 • Davis Brown, Charles Godfrey, Nicholas Konz, Jonathan Tu, Henry Kvinge

As language models are applied to an increasing number of real-world applications, understanding their inner workings has become an important issue in model trust, interpretability, and transparency.

Language Modelling

Paper
Add Code

Attributing Learned Concepts in Neural Networks to Training Data

no code implementations • 4 Oct 2023 • Nicholas Konz, Charles Godfrey, Madelyn Shapiro, Jonathan Tu, Henry Kvinge, Davis Brown

By now there is substantial evidence that deep learning models learn certain human-interpretable features as part of their internal representations of data.

Paper
Add Code

Edit at your own risk: evaluating the robustness of edited models to distribution shifts

no code implementations • 28 Feb 2023 • Davis Brown, Charles Godfrey, Cody Nizinski, Jonathan Tu, Henry Kvinge

The current trend toward ever-larger models makes standard retraining procedures an ever-more expensive burden.

Model Editing Navigate

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.