Search Results for author: Naomi Saphra

Found 28 papers, 10 papers with code

Knowing Your Nonlinearities: Shapley Interactions Reveal the Underlying Structure of Data

no code implementations19 Mar 2024 Divyansh Singhvi, Andrej Erkelens, Raghav Jain, Diganta Misra, Naomi Saphra

Measuring nonlinear feature interaction is an established approach to understanding complex patterns of attribution in many models.

Towards out-of-distribution generalization in large-scale astronomical surveys: robust networks learn similar representations

no code implementations29 Nov 2023 Yash Gondhalekar, Sultan Hassan, Naomi Saphra, Sambatra Andrianomena

The generalization of machine learning (ML) models to out-of-distribution (OOD) examples remains a key challenge in extracting information from upcoming astronomical surveys.

Inductive Bias Out-of-Distribution Generalization

Attribute Diversity Determines the Systematicity Gap in VQA

no code implementations15 Nov 2023 Ian Berlot-Attwell, A. Michael Carrell, Kumar Krishna Agrawal, Yash Sharma, Naomi Saphra

The degree to which neural networks can generalize to new combinations of familiar concepts, and the conditions under which they are able to do so, has long been an open question.

Attribute Question Answering +1

First Tragedy, then Parse: History Repeats Itself in the New Era of Large Language Models

no code implementations8 Nov 2023 Naomi Saphra, Eve Fleisig, Kyunghyun Cho, Adam Lopez

Many NLP researchers are experiencing an existential crisis triggered by the astonishing success of ChatGPT and other systems based on large language models (LLMs).

Machine Translation

TRAM: Bridging Trust Regions and Sharpness Aware Minimization

1 code implementation5 Oct 2023 Tom Sherborne, Naomi Saphra, Pradeep Dasigi, Hao Peng

We propose Trust Region Aware Minimization (TRAM), a SAM algorithm fine-tuning for low parameter sharpness and smooth, informative representations preserving pre-trained structure.

Domain Generalization Language Modelling +1

Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs

no code implementations13 Sep 2023 Angelica Chen, Ravid Shwartz-Ziv, Kyunghyun Cho, Matthew L. Leavitt, Naomi Saphra

Most interpretability research in NLP focuses on understanding the behavior and features of a fully trained model.

Latent State Models of Training Dynamics

1 code implementation18 Aug 2023 Michael Y. Hu, Angelica Chen, Naomi Saphra, Kyunghyun Cho

We use the HMM representation to study phase transitions and identify latent "detour" states that slow down convergence.

Image Classification Language Modelling +1

Dynamic Masking Rate Schedules for MLM Pretraining

no code implementations24 May 2023 Zachary Ankner, Naomi Saphra, Davis Blalock, Jonathan Frankle, Matthew L. Leavitt

Most works on transformers trained with the Masked Language Modeling (MLM) objective use the original BERT model's fixed masking rate of 15%.

Language Modelling Masked Language Modeling +1

One Venue, Two Conferences: The Separation of Chinese and American Citation Networks

no code implementations17 Nov 2022 Bingchen Zhao, Yuling Gu, Jessica Zosa Forde, Naomi Saphra

At NeurIPS, American and Chinese institutions cite papers from each other's regions substantially less than they cite endogamously.

Linear Connectivity Reveals Generalization Strategies

1 code implementation24 May 2022 Jeevesh Juneja, Rachit Bansal, Kyunghyun Cho, João Sedoc, Naomi Saphra

It is widely accepted in the mode connectivity literature that when two neural networks are trained similarly on the same data, they are connected by a path through parameter space over which test set accuracy is maintained.

CoLA QQP +1

A Non-Linear Structural Probe

no code implementations NAACL 2021 Jennifer C. White, Tiago Pimentel, Naomi Saphra, Ryan Cotterell

Probes are models devised to investigate the encoding of knowledge -- e. g. syntactic structure -- in contextual representations.

LSTMs Compose---and Learn---Bottom-Up

no code implementations Findings of the Association for Computational Linguistics 2020 Naomi Saphra, Adam Lopez

To explore the inductive biases that cause these compositional representations to arise during training, we conduct simple experiments on synthetic data.

LSTMs Compose (and Learn) Bottom-Up

no code implementations6 Oct 2020 Naomi Saphra, Adam Lopez

To explore the inductive biases that cause these compositional representations to arise during training, we conduct simple experiments on synthetic data.

Pareto Probing: Trading Off Accuracy for Complexity

1 code implementation EMNLP 2020 Tiago Pimentel, Naomi Saphra, Adina Williams, Ryan Cotterell

In our contribution to this discussion, we argue for a probe metric that reflects the fundamental trade-off between probe complexity and performance: the Pareto hypervolume.

Dependency Parsing

Word Interdependence Exposes How LSTMs Compose Representations

no code implementations27 Apr 2020 Naomi Saphra, Adam Lopez

Recent work in NLP shows that LSTM language models capture compositional structure in language data.

How to Evaluate Word Representations of Informal Domain?

1 code implementation12 Nov 2019 Yekun Chai, Naomi Saphra, Adam Lopez

Diverse word representations have surged in most state-of-the-art natural language processing (NLP) applications.

Word Embeddings

Sparsity Emerges Naturally in Neural Language Models

no code implementations ICML Workshop Deep_Phenomen 2019 Naomi Saphra, Adam Lopez

Concerns about interpretability, computational resources, and principled inductive priors have motivated efforts to engineer sparse neural models for NLP tasks.

Do LSTMs Learn Compositionally?

no code implementations28 May 2019 Naomi Saphra, Adam Lopez

LSTM-based language models exhibit compositionality in their representations, but how this behavior emerges over the course of training has not been explored.

Understanding Learning Dynamics Of Language Models with SVCCA

no code implementations NAACL 2019 Naomi Saphra, Adam Lopez

Research has shown that neural models implicitly encode linguistic features, but there has been no research showing \emph{how} these encodings arise as the models are trained.

Language Modelling

DyNet: The Dynamic Neural Network Toolkit

4 code implementations15 Jan 2017 Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, Pengcheng Yin

In the static declaration strategy that is used in toolkits like Theano, CNTK, and TensorFlow, the user first defines a computation graph (a symbolic representation of the computation), and then examples are fed into an engine that executes this computation and computes its derivatives.

graph construction

Evaluating Informal-Domain Word Representations With UrbanDictionary

1 code implementation WS 2016 Naomi Saphra, Adam Lopez

Existing corpora for intrinsic evaluation are not targeted towards tasks in informal domains such as Twitter or news comment forums.

Understanding Objects in Detail with Fine-Grained Attributes

no code implementations CVPR 2014 Andrea Vedaldi, Siddharth Mahendran, Stavros Tsogkas, Subhransu Maji, Ross Girshick, Juho Kannala, Esa Rahtu, Iasonas Kokkinos, Matthew B. Blaschko, David Weiss, Ben Taskar, Karen Simonyan, Naomi Saphra, Sammy Mohamed

We show that the collected data can be used to study the relation between part detection and attribute prediction by diagnosing the performance of classifiers that pool information from different parts of an object.

Attribute Object +2

Cannot find the paper you are looking for? You can Submit a new open access paper.