no code implementations • 19 Mar 2024 • Divyansh Singhvi, Andrej Erkelens, Raghav Jain, Diganta Misra, Naomi Saphra
Measuring nonlinear feature interaction is an established approach to understanding complex patterns of attribution in many models.
no code implementations • 29 Nov 2023 • Yash Gondhalekar, Sultan Hassan, Naomi Saphra, Sambatra Andrianomena
The generalization of machine learning (ML) models to out-of-distribution (OOD) examples remains a key challenge in extracting information from upcoming astronomical surveys.
no code implementations • 15 Nov 2023 • Ian Berlot-Attwell, A. Michael Carrell, Kumar Krishna Agrawal, Yash Sharma, Naomi Saphra
The degree to which neural networks can generalize to new combinations of familiar concepts, and the conditions under which they are able to do so, has long been an open question.
no code implementations • 8 Nov 2023 • Naomi Saphra, Eve Fleisig, Kyunghyun Cho, Adam Lopez
Many NLP researchers are experiencing an existential crisis triggered by the astonishing success of ChatGPT and other systems based on large language models (LLMs).
1 code implementation • 5 Oct 2023 • Tom Sherborne, Naomi Saphra, Pradeep Dasigi, Hao Peng
We propose Trust Region Aware Minimization (TRAM), a SAM algorithm fine-tuning for low parameter sharpness and smooth, informative representations preserving pre-trained structure.
no code implementations • 13 Sep 2023 • Angelica Chen, Ravid Shwartz-Ziv, Kyunghyun Cho, Matthew L. Leavitt, Naomi Saphra
Most interpretability research in NLP focuses on understanding the behavior and features of a fully trained model.
1 code implementation • 18 Aug 2023 • Michael Y. Hu, Angelica Chen, Naomi Saphra, Kyunghyun Cho
We use the HMM representation to study phase transitions and identify latent "detour" states that slow down convergence.
no code implementations • 24 May 2023 • Zachary Ankner, Naomi Saphra, Davis Blalock, Jonathan Frankle, Matthew L. Leavitt
Most works on transformers trained with the Masked Language Modeling (MLM) objective use the original BERT model's fixed masking rate of 15%.
no code implementations • 17 Nov 2022 • Bingchen Zhao, Yuling Gu, Jessica Zosa Forde, Naomi Saphra
At NeurIPS, American and Chinese institutions cite papers from each other's regions substantially less than they cite endogamously.
no code implementations • 6 Oct 2022 • Dieuwke Hupkes, Mario Giulianelli, Verna Dankers, Mikel Artetxe, Yanai Elazar, Tiago Pimentel, Christos Christodoulopoulos, Karim Lasri, Naomi Saphra, Arabella Sinclair, Dennis Ulmer, Florian Schottmann, Khuyagbaatar Batsuren, Kaiser Sun, Koustuv Sinha, Leila Khalatbari, Maria Ryskina, Rita Frieske, Ryan Cotterell, Zhijing Jin
We present a taxonomy for characterising and understanding generalisation research in NLP.
1 code implementation • COLING 2022 • Josef Valvoda, Naomi Saphra, Jonathan Rawski, Adina Williams, Ryan Cotterell
Recombining known primitive concepts into larger novel combinations is a quintessentially human cognitive capability.
1 code implementation • 24 May 2022 • Jeevesh Juneja, Rachit Bansal, Kyunghyun Cho, João Sedoc, Naomi Saphra
It is widely accepted in the mode connectivity literature that when two neural networks are trained similarly on the same data, they are connected by a path through parameter space over which test set accuracy is maintained.
3 code implementations • ICLR 2022 • Thibault Sellam, Steve Yadlowsky, Jason Wei, Naomi Saphra, Alexander D'Amour, Tal Linzen, Jasmijn Bastings, Iulia Turc, Jacob Eisenstein, Dipanjan Das, Ian Tenney, Ellie Pavlick
Experiments with pre-trained models such as BERT are often based on a single checkpoint.
no code implementations • NAACL 2021 • Jennifer C. White, Tiago Pimentel, Naomi Saphra, Ryan Cotterell
Probes are models devised to investigate the encoding of knowledge -- e. g. syntactic structure -- in contextual representations.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Naomi Saphra, Adam Lopez
To explore the inductive biases that cause these compositional representations to arise during training, we conduct simple experiments on synthetic data.
no code implementations • 6 Oct 2020 • Naomi Saphra, Adam Lopez
To explore the inductive biases that cause these compositional representations to arise during training, we conduct simple experiments on synthetic data.
1 code implementation • EMNLP 2020 • Tiago Pimentel, Naomi Saphra, Adina Williams, Ryan Cotterell
In our contribution to this discussion, we argue for a probe metric that reflects the fundamental trade-off between probe complexity and performance: the Pareto hypervolume.
no code implementations • 27 Apr 2020 • Naomi Saphra, Adam Lopez
Recent work in NLP shows that LSTM language models capture compositional structure in language data.
1 code implementation • 12 Nov 2019 • Yekun Chai, Naomi Saphra, Adam Lopez
Diverse word representations have surged in most state-of-the-art natural language processing (NLP) applications.
no code implementations • ICML Workshop Deep_Phenomen 2019 • Naomi Saphra, Adam Lopez
Concerns about interpretability, computational resources, and principled inductive priors have motivated efforts to engineer sparse neural models for NLP tasks.
no code implementations • 28 May 2019 • Naomi Saphra, Adam Lopez
LSTM-based language models exhibit compositionality in their representations, but how this behavior emerges over the course of training has not been explored.
no code implementations • WS 2018 • Naomi Saphra, Adam Lopez
A glut of recent research shows that language models capture linguistic structure.
no code implementations • NAACL 2019 • Naomi Saphra, Adam Lopez
Research has shown that neural models implicitly encode linguistic features, but there has been no research showing \emph{how} these encodings arise as the models are trained.
4 code implementations • 15 Jan 2017 • Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, Pengcheng Yin
In the static declaration strategy that is used in toolkits like Theano, CNTK, and TensorFlow, the user first defines a computation graph (a symbolic representation of the computation), and then examples are fed into an engine that executes this computation and computes its derivatives.
1 code implementation • WS 2016 • Naomi Saphra, Adam Lopez
Existing corpora for intrinsic evaluation are not targeted towards tasks in informal domains such as Twitter or news comment forums.
no code implementations • CVPR 2014 • Andrea Vedaldi, Siddharth Mahendran, Stavros Tsogkas, Subhransu Maji, Ross Girshick, Juho Kannala, Esa Rahtu, Iasonas Kokkinos, Matthew B. Blaschko, David Weiss, Ben Taskar, Karen Simonyan, Naomi Saphra, Sammy Mohamed
We show that the collected data can be used to study the relation between part detection and attribute prediction by diagnosing the performance of classifiers that pool information from different parts of an object.
1 code implementation • WS 2013 • Nathan Schneider, Brendan O'Connor, Naomi Saphra, David Bamman, Manaal Faruqui, Noah A. Smith, Chris Dyer, Jason Baldridge
We introduce a framework for lightweight dependency syntax annotation.