Search Results for author: Sen Wu

Found 19 papers, 11 papers with code

Metadata Shaping: A Simple Approach for Knowledge-Enhanced Language Models

1 code implementation • Findings (ACL) 2022 • Simran Arora, Sen Wu, Enci Liu, Christopher Re

We observe proposed methods typically start with a base LM and data that has been annotated with entity metadata, then change the model, by modifying the architecture or introducing auxiliary loss terms to better capture entity knowledge.

Paper
Code

NECA: Network-Embedded Deep Representation Learning for Categorical Data

no code implementations • 25 May 2022 • Xiaonan Gao, Sen Wu, Wenjun Zhou

We propose NECA, a deep representation learning method for categorical data.

Attribute Clustering +1

Paper
Add Code

Metadata Shaping: Natural Language Annotations for the Tail

1 code implementation • 16 Oct 2021 • Simran Arora, Sen Wu, Enci Liu, Christopher Re

Since rare entities and facts are prevalent in the queries users submit to popular applications such as search and personal assistant systems, improving the ability of LMs to reliably capture knowledge over rare entities is a pressing challenge studied in significant prior work.

Paper
Code

Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical Text

1 code implementation • Findings (EMNLP) 2021 • Maya Varma, Laurel Orr, Sen Wu, Megan Leszczynski, Xiao Ling, Christopher Ré

Named entity disambiguation (NED), which involves mapping textual mentions to structured entities, is particularly challenging in the medical domain due to the presence of rare entities.

Data Integration Entity Disambiguation

Paper
Code

Precise High-Dimensional Asymptotics for Quantifying Heterogeneous Transfers

no code implementations • 22 Oct 2020 • Fan Yang, Hongyang R. Zhang, Sen Wu, Christopher Ré, Weijie J. Su

Intuitively, the transfer effect from one task to another task depends on dataset shifts such as sample sizes and covariance matrices.

Multi-Task Learning text-classification +1

Paper
Add Code

Bootleg: Chasing the Tail with Self-Supervised Named Entity Disambiguation

1 code implementation • 20 Oct 2020 • Laurel Orr, Megan Leszczynski, Simran Arora, Sen Wu, Neel Guha, Xiao Ling, Christopher Re

A challenge for named entity disambiguation (NED), the task of mapping textual mentions to entities in a knowledge base, is how to disambiguate entities that appear rarely in the training data, termed tail entities.

Ranked #1 on Entity Disambiguation on AIDA-CoNLL (Micro-F1 metric)

Entity Disambiguation Relation Extraction

210

Paper
Code

Train and You'll Miss It: Interactive Model Iteration with Weak Supervision and Pre-Trained Embeddings

1 code implementation • 26 Jun 2020 • Mayee F. Chen, Daniel Y. Fu, Frederic Sala, Sen Wu, Ravi Teja Mullapudi, Fait Poms, Kayvon Fatahalian, Christopher Ré

Our goal is to enable machine learning systems to be trained interactively.

Transfer Learning

Paper
Code

Understanding and Improving Information Transfer in Multi-Task Learning

no code implementations • ICLR 2020 • Sen Wu, Hongyang R. Zhang, Christopher Ré

We investigate multi-task learning approaches that use a shared feature representation for all tasks.

Multi-Task Learning Sentiment Analysis

Paper
Add Code

On the Generalization Effects of Linear Transformations in Data Augmentation

2 code implementations • ICML 2020 • Sen Wu, Hongyang R. Zhang, Gregory Valiant, Christopher Ré

We validate our proposed scheme on image and text datasets.

Data Augmentation text-classification +1

Paper
Code

Ivy: Instrumental Variable Synthesis for Causal Inference

no code implementations • 11 Apr 2020 • Zhaobin Kuang, Frederic Sala, Nimit Sohoni, Sen Wu, Aldo Córdova-Palomera, Jared Dunnmon, James Priest, Christopher Ré

To relax these assumptions, we propose Ivy, a new method to combine IV candidates that can handle correlated and invalid IV candidates in a robust manner.

Causal Inference Epidemiology +1

Paper
Add Code

Understanding the Downstream Instability of Word Embeddings

1 code implementation • 29 Feb 2020 • Megan Leszczynski, Avner May, Jian Zhang, Sen Wu, Christopher R. Aberger, Christopher Ré

To theoretically explain this tradeoff, we introduce a new measure of embedding instability---the eigenspace instability measure---which we prove bounds the disagreement in downstream predictions introduced by the change in word embeddings.

Word Embeddings

Paper
Code

Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices

2 code implementations • NeurIPS 2019 • Vincent S. Chen, Sen Wu, Zhenzhen Weng, Alexander Ratner, Christopher Ré

In real-world machine learning applications, data subsets correspond to especially critical outcomes: vulnerable cyclist detections are safety-critical in an autonomous driving task, and "question" sentences might be important to a dialogue agent's language understanding for product purposes.

Autonomous Driving BIG-bench Machine Learning

376

Paper
Code

Snorkel: Rapid Training Data Creation with Weak Supervision

2 code implementations • 28 Nov 2017 • Alexander Ratner, Stephen H. Bach, Henry Ehrenberg, Jason Fries, Sen Wu, Christopher Ré

In a user study, subject matter experts build models 2. 8x faster and increase predictive performance an average 45. 5% versus seven hours of hand labeling.

BIG-bench Machine Learning

420

Paper
Code

Robust Sparse Coding via Self-Paced Learning

no code implementations • 10 Sep 2017 • Xiaodong Feng, Zhiwei Tang, Sen Wu

Sparse coding (SC) is attracting more and more attention due to its comprehensive theoretical studies and its excellent performance in many signal processing applications.

Paper
Add Code

SwellShark: A Generative Model for Biomedical Named Entity Recognition without Labeled Data

no code implementations • 20 Apr 2017 • Jason Fries, Sen Wu, Alex Ratner, Christopher Ré

We present SwellShark, a framework for building biomedical named entity recognition (NER) systems quickly and without hand-labeled data.

Ranked #2 on Weakly-Supervised Named Entity Recognition on BC5CDR

named-entity-recognition Named Entity Recognition +2

Paper
Add Code

Fonduer: Knowledge Base Construction from Richly Formatted Data

1 code implementation • 15 Mar 2017 • Sen Wu, Luke Hsiao, Xiao Cheng, Braden Hancock, Theodoros Rekatsinas, Philip Levis, Christopher Ré

We focus on knowledge base construction (KBC) from richly formatted data.

Databases

402

Paper
Code

Data Programming: Creating Large Training Sets, Quickly

4 code implementations • NeurIPS 2016 • Alexander Ratner, Christopher De Sa, Sen Wu, Daniel Selsam, Christopher Ré

Additionally, in initial user studies we observed that data programming may be an easier way for non-experts to create machine learning models when training data is limited or unavailable.

BIG-bench Machine Learning Slot Filling

5,702

Paper
Code

Incremental Knowledge Base Construction Using DeepDive

no code implementations • 3 Feb 2015 • Jaeho Shin, Sen Wu, Feiran Wang, Christopher De Sa, Ce Zhang, Christopher Ré

Populating a database with unstructured information is a long-standing problem in industry and research that encompasses problems of extraction, cleaning, and integration.

Paper
Add Code

Feature Engineering for Knowledge Base Construction

no code implementations • 24 Jul 2014 • Christopher Ré, Amir Abbas Sadeghian, Zifei Shan, Jaeho Shin, Feiran Wang, Sen Wu, Ce Zhang

Our approach to KBC is based on joint probabilistic inference and learning, but we do not see inference as either a panacea or a magic bullet: inference is a tool that allows us to be systematic in how we construct, debug, and improve the quality of such systems.

Feature Engineering

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.