1 code implementation • Findings (ACL) 2022 • Simran Arora, Sen Wu, Enci Liu, Christopher Re
We observe proposed methods typically start with a base LM and data that has been annotated with entity metadata, then change the model, by modifying the architecture or introducing auxiliary loss terms to better capture entity knowledge.
no code implementations • 25 May 2022 • Xiaonan Gao, Sen Wu, Wenjun Zhou
We propose NECA, a deep representation learning method for categorical data.
1 code implementation • 16 Oct 2021 • Simran Arora, Sen Wu, Enci Liu, Christopher Re
Since rare entities and facts are prevalent in the queries users submit to popular applications such as search and personal assistant systems, improving the ability of LMs to reliably capture knowledge over rare entities is a pressing challenge studied in significant prior work.
1 code implementation • Findings (EMNLP) 2021 • Maya Varma, Laurel Orr, Sen Wu, Megan Leszczynski, Xiao Ling, Christopher Ré
Named entity disambiguation (NED), which involves mapping textual mentions to structured entities, is particularly challenging in the medical domain due to the presence of rare entities.
no code implementations • 22 Oct 2020 • Fan Yang, Hongyang R. Zhang, Sen Wu, Christopher Ré, Weijie J. Su
Intuitively, the transfer effect from one task to another task depends on dataset shifts such as sample sizes and covariance matrices.
1 code implementation • 20 Oct 2020 • Laurel Orr, Megan Leszczynski, Simran Arora, Sen Wu, Neel Guha, Xiao Ling, Christopher Re
A challenge for named entity disambiguation (NED), the task of mapping textual mentions to entities in a knowledge base, is how to disambiguate entities that appear rarely in the training data, termed tail entities.
Ranked #1 on Entity Disambiguation on AIDA-CoNLL (Micro-F1 metric)
1 code implementation • 26 Jun 2020 • Mayee F. Chen, Daniel Y. Fu, Frederic Sala, Sen Wu, Ravi Teja Mullapudi, Fait Poms, Kayvon Fatahalian, Christopher Ré
Our goal is to enable machine learning systems to be trained interactively.
no code implementations • ICLR 2020 • Sen Wu, Hongyang R. Zhang, Christopher Ré
We investigate multi-task learning approaches that use a shared feature representation for all tasks.
2 code implementations • ICML 2020 • Sen Wu, Hongyang R. Zhang, Gregory Valiant, Christopher Ré
We validate our proposed scheme on image and text datasets.
no code implementations • 11 Apr 2020 • Zhaobin Kuang, Frederic Sala, Nimit Sohoni, Sen Wu, Aldo Córdova-Palomera, Jared Dunnmon, James Priest, Christopher Ré
To relax these assumptions, we propose Ivy, a new method to combine IV candidates that can handle correlated and invalid IV candidates in a robust manner.
1 code implementation • 29 Feb 2020 • Megan Leszczynski, Avner May, Jian Zhang, Sen Wu, Christopher R. Aberger, Christopher Ré
To theoretically explain this tradeoff, we introduce a new measure of embedding instability---the eigenspace instability measure---which we prove bounds the disagreement in downstream predictions introduced by the change in word embeddings.
2 code implementations • NeurIPS 2019 • Vincent S. Chen, Sen Wu, Zhenzhen Weng, Alexander Ratner, Christopher Ré
In real-world machine learning applications, data subsets correspond to especially critical outcomes: vulnerable cyclist detections are safety-critical in an autonomous driving task, and "question" sentences might be important to a dialogue agent's language understanding for product purposes.
2 code implementations • 28 Nov 2017 • Alexander Ratner, Stephen H. Bach, Henry Ehrenberg, Jason Fries, Sen Wu, Christopher Ré
In a user study, subject matter experts build models 2. 8x faster and increase predictive performance an average 45. 5% versus seven hours of hand labeling.
no code implementations • 10 Sep 2017 • Xiaodong Feng, Zhiwei Tang, Sen Wu
Sparse coding (SC) is attracting more and more attention due to its comprehensive theoretical studies and its excellent performance in many signal processing applications.
no code implementations • 20 Apr 2017 • Jason Fries, Sen Wu, Alex Ratner, Christopher Ré
We present SwellShark, a framework for building biomedical named entity recognition (NER) systems quickly and without hand-labeled data.
Ranked #2 on Weakly-Supervised Named Entity Recognition on BC5CDR
1 code implementation • 15 Mar 2017 • Sen Wu, Luke Hsiao, Xiao Cheng, Braden Hancock, Theodoros Rekatsinas, Philip Levis, Christopher Ré
We focus on knowledge base construction (KBC) from richly formatted data.
Databases
4 code implementations • NeurIPS 2016 • Alexander Ratner, Christopher De Sa, Sen Wu, Daniel Selsam, Christopher Ré
Additionally, in initial user studies we observed that data programming may be an easier way for non-experts to create machine learning models when training data is limited or unavailable.
no code implementations • 3 Feb 2015 • Jaeho Shin, Sen Wu, Feiran Wang, Christopher De Sa, Ce Zhang, Christopher Ré
Populating a database with unstructured information is a long-standing problem in industry and research that encompasses problems of extraction, cleaning, and integration.
no code implementations • 24 Jul 2014 • Christopher Ré, Amir Abbas Sadeghian, Zifei Shan, Jaeho Shin, Feiran Wang, Sen Wu, Ce Zhang
Our approach to KBC is based on joint probabilistic inference and learning, but we do not see inference as either a panacea or a magic bullet: inference is a tool that allows us to be systematic in how we construct, debug, and improve the quality of such systems.