no code implementations • EMNLP (NLP-COVID19) 2020 • Austin Van Loon, Sheridan Stewart, Brandon Waldon, Shrinidhi K Lakshmikanth, Ishan Shah, Sharath Chandra Guntuku, Garrick Sherman, James Zou, Johannes Eichstaedt
Our ability to limit the future spread of COVID-19 will in part depend on our understanding of the psychological and sociological processes that lead people to follow or reject coronavirus health behaviors.
no code implementations • 26 Apr 2024 • Valeriia Cherepanova, James Zou
Large language models (LLMs) exhibit excellent ability to understand human languages, but do they also understand their own language that appears gibberish to us?
1 code implementation • 19 Apr 2024 • Yuchi Liu, Lei Wang, Yuli Zou, James Zou, Liang Zheng
For example, for a narrow misclassification, a calibrator trained by the CE loss often produces high confidence on the wrongly predicted class (e. g., a test sample is wrongly classified and its softmax score on the ground truth class is around 0. 4), which is undesirable.
1 code implementation • 19 Apr 2024 • Shirley Wu, Shiyu Zhao, Michihiro Yasunaga, Kexin Huang, Kaidi Cao, Qian Huang, Vassilis N. Ioannidis, Karthik Subbian, James Zou, Jure Leskovec
Answering real-world user queries, such as product search, often requires accurate retrieval of information from semi-structured knowledge bases or databases that involve blend of unstructured (e. g., textual descriptions of products) and structured (e. g., entity relations of products) information.
no code implementations • 16 Apr 2024 • Kevin Wu, Eric Wu, James Zou
However, when the reference document is perturbed with increasing levels of wrong values, the LLM is more likely to recite the incorrect, modified information when its internal prior is weaker but is more resistant when its prior is stronger.
no code implementations • 4 Mar 2024 • Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Ion Stoica, Matei Zaharia, James Zou
We find empirically that across multiple language tasks, surprisingly, Voting Inference Systems' performance first increases but then decreases as a function of the number of LLM calls.
1 code implementation • 28 Feb 2024 • Simran Arora, Sabri Eyuboglu, Michael Zhang, Aman Timalsina, Silas Alberti, Dylan Zinsley, James Zou, Atri Rudra, Christopher Ré
In this work, we explore whether we can improve language model efficiency (e. g. by reducing memory consumption) without compromising on recall.
no code implementations • 21 Feb 2024 • Federico Bianchi, James Zou
The risks derived from large language models (LLMs) generating deceptive and damaging content have been the subject of considerable research, but even safe generations can lead to problematic downstream impacts.
1 code implementation • 18 Feb 2024 • Gautam Machiraju, Alexander Derry, Arjun Desai, Neel Guha, Amir-Hossein Karimi, James Zou, Russ Altman, Christopher Ré, Parag Mallick
Feature attribution, the ability to localize regions of the input data that are relevant for classification, is an important capability for machine learning models in scientific and biomedical domains.
1 code implementation • 8 Feb 2024 • Federico Bianchi, Patrick John Chia, Mert Yuksekgonul, Jacopo Tagliabue, Dan Jurafsky, James Zou
We develop NegotiationArena: a flexible framework for evaluating and probing the negotiation abilities of LLM agents.
1 code implementation • 7 Feb 2024 • Weixin Liang, Nazneen Rajani, Xinyu Yang, Ezinwanne Ozoani, Eric Wu, Yiqun Chen, Daniel Scott Smith, James Zou
To evaluate the impact of model cards, we conducted an intervention study by adding detailed model cards to 42 popular models which had no or sparse model cards previously.
no code implementations • 4 Feb 2024 • Haowei Lin, Baizhou Huang, Haotian Ye, Qinyu Chen, ZiHao Wang, Sujian Li, Jianzhu Ma, Xiaojun Wan, James Zou, Yitao Liang
The ever-growing ecosystem of LLMs has posed a challenge in selecting the most appropriate pre-trained model to fine-tune amidst a sea of options.
no code implementations • 3 Feb 2024 • Kevin Wu, Eric Wu, Ally Cassasola, Angela Zhang, Kevin Wei, Teresa Nguyen, Sith Riantawan, Patricia Shi Riantawan, Daniel E. Ho, James Zou
In this paper, we ask: do the sources that LLMs generate actually support the claims that they make?
1 code implementation • 29 Jan 2024 • Ian Covert, Chanwoo Kim, Su-In Lee, James Zou, Tatsunori Hashimoto
Many tasks in explainable machine learning, such as data valuation and feature attribution, perform expensive computation for each data point and can be intractable for large datasets.
1 code implementation • 24 Jan 2024 • Xinyu Yang, Weixin Liang, James Zou
By analyzing all 7, 433 dataset documentation on Hugging Face, our investigation provides an overview of the Hugging Face dataset ecosystem and insights into dataset documentation practices, yielding 5 main findings: (1) The dataset card completion rate shows marked heterogeneity correlated with dataset popularity.
1 code implementation • 10 Jan 2024 • Lichao Sun, Yue Huang, Haoran Wang, Siyuan Wu, Qihui Zhang, Yuan Li, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Zhengliang Liu, Yixin Liu, Yijue Wang, Zhikun Zhang, Bertie Vidgen, Bhavya Kailkhura, Caiming Xiong, Chaowei Xiao, Chunyuan Li, Eric Xing, Furong Huang, Hao liu, Heng Ji, Hongyi Wang, huan zhang, Huaxiu Yao, Manolis Kellis, Marinka Zitnik, Meng Jiang, Mohit Bansal, James Zou, Jian Pei, Jian Liu, Jianfeng Gao, Jiawei Han, Jieyu Zhao, Jiliang Tang, Jindong Wang, Joaquin Vanschoren, John Mitchell, Kai Shu, Kaidi Xu, Kai-Wei Chang, Lifang He, Lifu Huang, Michael Backes, Neil Zhenqiang Gong, Philip S. Yu, Pin-Yu Chen, Quanquan Gu, ran Xu, Rex Ying, Shuiwang Ji, Suman Jana, Tianlong Chen, Tianming Liu, Tianyi Zhou, William Wang, Xiang Li, Xiangliang Zhang, Xiao Wang, Xing Xie, Xun Chen, Xuyu Wang, Yan Liu, Yanfang Ye, Yinzhi Cao, Yong Chen, Yue Zhao
This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions.
no code implementations • 4 Jan 2024 • Lizhen Liang, Han Zhuang, James Zou, Daniel E. Acuna
Artificial intelligence (AI) has seen tremendous development in industry and academia.
no code implementations • 3 Jan 2024 • Haonan Wang, James Zou, Michael Mozer, Anirudh Goyal, Alex Lamb, Linjun Zhang, Weijie J Su, Zhun Deng, Michael Qizhe Xie, Hannah Brown, Kenji Kawaguchi
With the rise of advanced generative AI models capable of tasks once reserved for human creativity, the study of AI's creative potential becomes imperative for its responsible development and application.
no code implementations • 20 Dec 2023 • Jiachen Zhao, Zhun Deng, David Madras, James Zou, Mengye Ren
As the number of large language models (LLMs) released to the public grows, there is a pressing need to understand the safety implications associated with these models learning from third-party custom finetuning data.
2 code implementations • 8 Dec 2023 • Simran Arora, Sabri Eyuboglu, Aman Timalsina, Isys Johnson, Michael Poli, James Zou, Atri Rudra, Christopher Ré
To close the gap between synthetics and real language, we develop a new formalization of the task called multi-query associative recall (MQAR) that better reflects actual language.
no code implementations • 7 Dec 2023 • Shirley Wu, Kaidi Cao, Bruno Ribeiro, James Zou, Jure Leskovec
Graph data are inherently complex and heterogeneous, leading to a high natural diversity of distributional shifts.
no code implementations • 4 Dec 2023 • Karanpartap Singh, James Zou
With the increasing use of large-language models (LLMs) like ChatGPT, watermarking has emerged as a promising approach for tracing machine-generated content.
no code implementations • 22 Nov 2023 • Lingjiao Chen, Bilge Acun, Newsha Ardalani, Yifan Sun, Feiyang Kang, Hanrui Lyu, Yongchan Kwon, Ruoxi Jia, Carole-Jean Wu, Matei Zaharia, James Zou
As Machine Learning (ML) systems continue to grow, the demand for relevant and comprehensive datasets becomes imperative.
no code implementations • 21 Nov 2023 • Luis Oala, Manil Maskey, Lilith Bat-Leah, Alicia Parrish, Nezihe Merve Gürel, Tzu-Sheng Kuo, Yang Liu, Rotem Dror, Danilo Brajovic, Xiaozhe Yao, Max Bartolo, William A Gaviria Rojas, Ryan Hileman, Rainier Aliment, Michael W. Mahoney, Meg Risdal, Matthew Lease, Wojciech Samek, Debojyoti Dutta, Curtis G Northcutt, Cody Coleman, Braden Hancock, Bernard Koch, Girmaw Abebe Tadesse, Bojan Karlaš, Ahmed Alaa, Adji Bousso Dieng, Natasha Noy, Vijay Janapa Reddi, James Zou, Praveen Paritosh, Mihaela van der Schaar, Kurt Bollacker, Lora Aroyo, Ce Zhang, Joaquin Vanschoren, Isabelle Guyon, Peter Mattson
Drawing from discussions at the inaugural DMLR workshop at ICML 2023 and meetings prior, in this report we outline the relevance of community engagement and infrastructure development for the creation of next-generation public datasets that will advance machine learning science.
1 code implementation • 11 Nov 2023 • Sheng Liu, Haotian Ye, Lei Xing, James Zou
On a new query, instead of adding demonstrations to the prompt, we shift the latent states of the LLM using the ICV.
no code implementations • 10 Nov 2023 • Angela Zhang, Mert Yuksekgonul, Joshua Guild, James Zou, Joseph C. Wu
One early application has been to medicine, where LLMs have been investigated to streamline clinical workflows and facilitate clinical analysis and decision-making.
1 code implementation • 6 Nov 2023 • Chenhang Cui, Yiyang Zhou, Xinyu Yang, Shirley Wu, Linjun Zhang, James Zou, Huaxiu Yao
To bridge this gap, we introduce a new benchmark, namely, the Bias and Interference Challenges in Visual Language Models (Bingo).
1 code implementation • 3 Oct 2023 • Weixin Liang, Yuhui Zhang, Hancheng Cao, Binglu Wang, Daisy Ding, Xinyu Yang, Kailas Vodrahalli, Siyu He, Daniel Smith, Yian Yin, Daniel McFarland, James Zou
We first quantitatively compared GPT-4's generated feedback with human peer reviewer feedback in 15 Nature family journals (3, 096 papers in total) and the ICLR machine learning conference (1, 709 papers).
1 code implementation • 2 Oct 2023 • Yongchan Kwon, Eric Wu, Kevin Wu, James Zou
Quantifying the impact of training data points is crucial for understanding the outputs of machine learning models and for improving the transparency of the AI pipeline.
2 code implementations • 14 Sep 2023 • Federico Bianchi, Mirac Suzgun, Giuseppe Attanasio, Paul Röttger, Dan Jurafsky, Tatsunori Hashimoto, James Zou
Training large language models to follow instructions makes them perform better on a wide range of tasks and generally become more helpful.
no code implementations • 31 Aug 2023 • Jesutofunmi A. Omiye, Haiwen Gui, Shawheen J. Rezaei, James Zou, Roxana Daneshjou
Large language models (LLMs) have been applied to tasks in healthcare, ranging from medical exam questions to responding to patient questions.
1 code implementation • 3 Aug 2023 • Rong Ma, Eric D. Sun, David Donoho, James Zou
To overcome these limitations, we present a spectral manifold alignment and inference (SMAI) framework, which enables principled and interpretable alignability testing and structure-preserving integration of single-cell data with the same type of features.
4 code implementations • 18 Jul 2023 • Lingjiao Chen, Matei Zaharia, James Zou
We find that the performance and behavior of both GPT-3. 5 and GPT-4 can vary greatly over time.
no code implementations • 6 Jul 2023 • Xinming Tu, James Zou, Weijie J. Su, Linjun Zhang
LLMs can also play a significant role in the classroom as interactive teaching and learning tools, contributing to personalized education.
1 code implementation • 13 Jun 2023 • Kailas Vodrahalli, James Zou
To study this interaction, we created ArtWhisperer, an online game where users are given a target image and are tasked with iteratively finding a prompt that creates a similar-looking image as the target.
1 code implementation • NeurIPS 2023 • Paul Pu Liang, Zihao Deng, Martin Ma, James Zou, Louis-Philippe Morency, Ruslan Salakhutdinov
How can we learn self-supervised multimodal representations to capture both shared and unique information relevant to downstream tasks?
1 code implementation • 27 May 2023 • Yuhui Zhang, Michihiro Yasunaga, Zhengping Zhou, Jeff Z. HaoChen, James Zou, Percy Liang, Serena Yeung
Language models have been shown to exhibit positive scaling, where performance improves as models are scaled up in terms of size, compute, or data.
1 code implementation • 26 May 2023 • Kai Zhang, Jun Yu, Eashan Adhikarla, Rong Zhou, Zhiling Yan, Yixin Liu, Zhengliang Liu, Lifang He, Brian Davison, Xiang Li, Hui Ren, Sunyang Fu, James Zou, Wei Liu, Jing Huang, Chen Chen, Yuyin Zhou, Tianming Liu, Xun Chen, Yong Chen, Quanzheng Li, Hongfang Liu, Lichao Sun
Conventional task- and modality-specific artificial intelligence (AI) models are inflexible in real-world deployment and maintenance for biomedicine.
Ranked #1 on Text Summarization on MeQSum
no code implementations • 9 May 2023 • Lingjiao Chen, Matei Zaharia, James Zou
There is a rapidly growing number of large language models (LLMs) that users can query for a fee.
1 code implementation • 4 May 2023 • Weixin Liang, Yining Mao, Yongchan Kwon, Xinyu Yang, James Zou
Our work highlights the importance of understanding the nonlinear effects of model improvement on performance in different subpopulations, and has the potential to inform the development of more equitable and responsible machine learning models.
1 code implementation • 1 May 2023 • Shirley Wu, Mert Yuksekgonul, Linjun Zhang, James Zou
Deep neural networks often rely on spurious correlations to make predictions, which hinders generalization beyond training environments.
1 code implementation • 29 Apr 2023 • Zachary Izzo, Ruishan Liu, James Zou
To do this, simple parametric models are frequently used (e. g. coefficients of linear regression) but usually fitted on the whole dataset.
no code implementations • 21 Apr 2023 • Jiaxi Yang, Wenglong Deng, Benlin Liu, Yangsibo Huang, James Zou, Xiaoxiao Li
Specifically, we introduce Generative Model Valuator (GMValuator), the first training-free and model-agnostic approach to provide data valuation for generation tasks.
2 code implementations • 16 Apr 2023 • Yongchan Kwon, James Zou
As a result, it has been recognized as infeasible to apply to large datasets.
2 code implementations • 8 Apr 2023 • Yuzhen Mao, Zhun Deng, Huaxiu Yao, Ting Ye, Kenji Kawaguchi, James Zou
As machine learning has been deployed ubiquitously across applications in modern data science, algorithmic fairness has become a great concern.
2 code implementations • 6 Apr 2023 • Weixin Liang, Mert Yuksekgonul, Yining Mao, Eric Wu, James Zou
In this study, we evaluate the performance of several widely-used GPT detectors using writing samples from native and non-native English writers.
1 code implementation • bioRxiv 2023 • Zhi Huang, Federico Bianchi, Mert Yuksekgonul, Thomas Montine, James Zou
This is the largest public dataset for pathology images annotated with natural text.
1 code implementation • 13 Feb 2023 • Ryumei Nakada, Halil Ibrahim Gulluk, Zhun Deng, Wenlong Ji, James Zou, Linjun Zhang
We show that the algorithm can detect the ground-truth pairs and improve performance by fully exploiting unpaired datasets.
1 code implementation • 8 Feb 2023 • Yuhui Zhang, Jeff Z. HaoChen, Shih-Cheng Huang, Kuan-Chieh Wang, James Zou, Serena Yeung
Our proposed method can discover high-error data slices, identify influential attributes and further rectify undesirable model behaviors, without requiring any visual data.
no code implementations • 1 Feb 2023 • Roxana Daneshjou, Mert Yuksekgonul, Zhuo Ran Cai, Roberto Novoa, James Zou
To provide a medical dataset densely annotated by domain experts with annotations useful across multiple disease processes, we developed SkinCon: a skin disease dataset densely annotated by dermatologists.
1 code implementation • 28 Nov 2022 • Puheng Li, James Zou, Linjun Zhang
Several group fairness notions and algorithms have been proposed.
no code implementations • 12 Nov 2022 • Zachary Izzo, Jinsung Yoon, Sercan O. Arik, James Zou
However, DP's strong theoretical guarantees often come at the cost of a large drop in its utility for machine learning, and DP guarantees themselves can be difficult to interpret.
1 code implementation • 7 Nov 2022 • Federico Bianchi, Pratyusha Kalluri, Esin Durmus, Faisal Ladhak, Myra Cheng, Debora Nozza, Tatsunori Hashimoto, Dan Jurafsky, James Zou, Aylin Caliskan
For example, we find cases of prompting for basic traits or social roles resulting in images reinforcing whiteness as ideal, prompting for occupations resulting in amplification of racial and gender disparities, and prompting for objects resulting in reification of American norms.
1 code implementation • 25 Oct 2022 • Rong Ma, Eric D. Sun, James Zou
Then it leverages the eigenscores to obtain a consensus visualization, which has much improved { quality over the individual visualizations in capturing the underlying true data structure.}
1 code implementation • 20 Oct 2022 • Haotian Ye, James Zou, Linjun Zhang
This opens a promising strategy to first train a feature learner rather than a classifier, and then perform linear probing (last layer retraining) in the test environment.
1 code implementation • 11 Oct 2022 • Huaxiu Yao, Yiping Wang, Linjun Zhang, James Zou, Chelsea Finn
In this paper, we propose a simple yet powerful algorithm, C-Mixup, to improve generalization on regression tasks.
no code implementations • 11 Oct 2022 • Nazneen Rajani, Weixin Liang, Lingjiao Chen, Meg Mitchell, James Zou
With the advent of Transformers, large language models (LLMs) have saturated well-known NLP benchmarks and leaderboards with high aggregate performance.
no code implementations • 11 Oct 2022 • Zhenbang Wu, Huaxiu Yao, Zhe Su, David M Liebovitz, Lucas M Glass, James Zou, Chelsea Finn, Jimeng Sun
However, newly approved drugs do not have much historical prescription data and cannot leverage existing drug recommendation methods.
1 code implementation • 4 Oct 2022 • Mert Yuksekgonul, Federico Bianchi, Pratyusha Kalluri, Dan Jurafsky, James Zou
ARO consists of Visual Genome Attribution, to test the understanding of objects' properties; Visual Genome Relation, to test for relational understanding; and COCO & Flickr30k-Order, to test for order sensitivity.
no code implementations • 3 Oct 2022 • Xinyi Zhao, Weixin Liang, James Zou
Data is the fuel powering AI and creates tremendous value for many domains.
no code implementations • 2 Oct 2022 • Prashnna K Gyawali, Xiaoxia Liu, James Zou, Zihuai He
Despite extensive recent efforts to define different feature importance metrics for deep learning models, we identified that inherent stochasticity in the design and training of deep learning models makes commonly used feature importance scores unstable.
1 code implementation • 27 Sep 2022 • Yongchan Kwon, James Zou
On several real-world datasets, we demonstrate that the influential features identified by WeightedSHAP are better able to recapitulate the model's predictions compared to the features identified by the Shapley value.
no code implementations • 18 Sep 2022 • Lingjiao Chen, Matei Zaharia, James Zou
We further propose SEES, an algorithmic framework to characterize the distribution shift under SJS and to estimate a model's performance on new data without any labels.
1 code implementation • 18 Sep 2022 • Lingjiao Chen, Zhihua Jin, Sabri Eyuboglu, Christopher Ré, Matei Zaharia, James Zou
HAPI is the first large-scale dataset of ML API usages and is a unique resource for studying ML-as-a-service (MLaaS).
1 code implementation • 12 Sep 2022 • Kailas Vodrahalli, Justin Ko, Albert S. Chiou, Roberto Novoa, Abubakar Abid, Michelle Phung, Kiana Yekrang, Paige Petrone, James Zou, Roxana Daneshjou
To address this issue, we developed TrueImage 2. 0, an artificial intelligence (AI) model for assessing patient photo quality for telemedicine and providing real-time feedback to patients for photo quality improvement.
1 code implementation • NeurIPS 2023 • Mark Mazumder, Colby Banbury, Xiaozhe Yao, Bojan Karlaš, William Gaviria Rojas, Sudnya Diamos, Greg Diamos, Lynn He, Alicia Parrish, Hannah Rose Kirk, Jessica Quaye, Charvi Rastogi, Douwe Kiela, David Jurado, David Kanter, Rafael Mosquera, Juan Ciro, Lora Aroyo, Bilge Acun, Lingjiao Chen, Mehul Smriti Raje, Max Bartolo, Sabri Eyuboglu, Amirata Ghorbani, Emmett Goodman, Oana Inel, Tariq Kane, Christine R. Kirkpatrick, Tzu-Sheng Kuo, Jonas Mueller, Tristan Thrush, Joaquin Vanschoren, Margaret Warren, Adina Williams, Serena Yeung, Newsha Ardalani, Praveen Paritosh, Lilith Bat-Leah, Ce Zhang, James Zou, Carole-Jean Wu, Cody Coleman, Andrew Ng, Peter Mattson, Vijay Janapa Reddi
Machine learning research has long focused on models rather than datasets, and prominent datasets are used for common ML tasks without regard to the breadth, difficulty, and faithfulness of the underlying problems.
1 code implementation • 30 Jun 2022 • Zhiying Zhu, Weixin Liang, James Zou
Motivated by this, we propose a novel task, dataset explanation.
3 code implementations • 9 Jun 2022 • Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ambrose Slone, Ameet Rahane, Anantharaman S. Iyer, Anders Andreassen, Andrea Madotto, Andrea Santilli, Andreas Stuhlmüller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakaş, B. Ryan Roberts, Bao Sheng Loe, Barret Zoph, Bartłomiej Bojanowski, Batuhan Özyurt, Behnam Hedayatnia, Behnam Neyshabur, Benjamin Inden, Benno Stein, Berk Ekmekci, Bill Yuchen Lin, Blake Howald, Bryan Orinion, Cameron Diao, Cameron Dour, Catherine Stinson, Cedrick Argueta, César Ferri Ramírez, Chandan Singh, Charles Rathkopf, Chenlin Meng, Chitta Baral, Chiyu Wu, Chris Callison-Burch, Chris Waites, Christian Voigt, Christopher D. Manning, Christopher Potts, Cindy Ramirez, Clara E. Rivera, Clemencia Siro, Colin Raffel, Courtney Ashcraft, Cristina Garbacea, Damien Sileo, Dan Garrette, Dan Hendrycks, Dan Kilman, Dan Roth, Daniel Freeman, Daniel Khashabi, Daniel Levy, Daniel Moseguí González, Danielle Perszyk, Danny Hernandez, Danqi Chen, Daphne Ippolito, Dar Gilboa, David Dohan, David Drakard, David Jurgens, Debajyoti Datta, Deep Ganguli, Denis Emelin, Denis Kleyko, Deniz Yuret, Derek Chen, Derek Tam, Dieuwke Hupkes, Diganta Misra, Dilyar Buzan, Dimitri Coelho Mollo, Diyi Yang, Dong-Ho Lee, Dylan Schrader, Ekaterina Shutova, Ekin Dogus Cubuk, Elad Segal, Eleanor Hagerman, Elizabeth Barnes, Elizabeth Donoway, Ellie Pavlick, Emanuele Rodola, Emma Lam, Eric Chu, Eric Tang, Erkut Erdem, Ernie Chang, Ethan A. Chi, Ethan Dyer, Ethan Jerzak, Ethan Kim, Eunice Engefu Manyasi, Evgenii Zheltonozhskii, Fanyue Xia, Fatemeh Siar, Fernando Martínez-Plumed, Francesca Happé, Francois Chollet, Frieda Rong, Gaurav Mishra, Genta Indra Winata, Gerard de Melo, Germán Kruszewski, Giambattista Parascandolo, Giorgio Mariani, Gloria Wang, Gonzalo Jaimovitch-López, Gregor Betz, Guy Gur-Ari, Hana Galijasevic, Hannah Kim, Hannah Rashkin, Hannaneh Hajishirzi, Harsh Mehta, Hayden Bogar, Henry Shevlin, Hinrich Schütze, Hiromu Yakura, Hongming Zhang, Hugh Mee Wong, Ian Ng, Isaac Noble, Jaap Jumelet, Jack Geissinger, Jackson Kernion, Jacob Hilton, Jaehoon Lee, Jaime Fernández Fisac, James B. Simon, James Koppel, James Zheng, James Zou, Jan Kocoń, Jana Thompson, Janelle Wingfield, Jared Kaplan, Jarema Radom, Jascha Sohl-Dickstein, Jason Phang, Jason Wei, Jason Yosinski, Jekaterina Novikova, Jelle Bosscher, Jennifer Marsh, Jeremy Kim, Jeroen Taal, Jesse Engel, Jesujoba Alabi, Jiacheng Xu, Jiaming Song, Jillian Tang, Joan Waweru, John Burden, John Miller, John U. Balis, Jonathan Batchelder, Jonathan Berant, Jörg Frohberg, Jos Rozen, Jose Hernandez-Orallo, Joseph Boudeman, Joseph Guerr, Joseph Jones, Joshua B. Tenenbaum, Joshua S. Rule, Joyce Chua, Kamil Kanclerz, Karen Livescu, Karl Krauth, Karthik Gopalakrishnan, Katerina Ignatyeva, Katja Markert, Kaustubh D. Dhole, Kevin Gimpel, Kevin Omondi, Kory Mathewson, Kristen Chiafullo, Ksenia Shkaruta, Kumar Shridhar, Kyle McDonell, Kyle Richardson, Laria Reynolds, Leo Gao, Li Zhang, Liam Dugan, Lianhui Qin, Lidia Contreras-Ochando, Louis-Philippe Morency, Luca Moschella, Lucas Lam, Lucy Noble, Ludwig Schmidt, Luheng He, Luis Oliveros Colón, Luke Metz, Lütfi Kerem Şenel, Maarten Bosma, Maarten Sap, Maartje ter Hoeve, Maheen Farooqi, Manaal Faruqui, Mantas Mazeika, Marco Baturan, Marco Marelli, Marco Maru, Maria Jose Ramírez Quintana, Marie Tolkiehn, Mario Giulianelli, Martha Lewis, Martin Potthast, Matthew L. Leavitt, Matthias Hagen, Mátyás Schubert, Medina Orduna Baitemirova, Melody Arnaud, Melvin McElrath, Michael A. Yee, Michael Cohen, Michael Gu, Michael Ivanitskiy, Michael Starritt, Michael Strube, Michał Swędrowski, Michele Bevilacqua, Michihiro Yasunaga, Mihir Kale, Mike Cain, Mimee Xu, Mirac Suzgun, Mitch Walker, Mo Tiwari, Mohit Bansal, Moin Aminnaseri, Mor Geva, Mozhdeh Gheini, Mukund Varma T, Nanyun Peng, Nathan A. Chi, Nayeon Lee, Neta Gur-Ari Krakover, Nicholas Cameron, Nicholas Roberts, Nick Doiron, Nicole Martinez, Nikita Nangia, Niklas Deckers, Niklas Muennighoff, Nitish Shirish Keskar, Niveditha S. Iyer, Noah Constant, Noah Fiedel, Nuan Wen, Oliver Zhang, Omar Agha, Omar Elbaghdadi, Omer Levy, Owain Evans, Pablo Antonio Moreno Casares, Parth Doshi, Pascale Fung, Paul Pu Liang, Paul Vicol, Pegah Alipoormolabashi, Peiyuan Liao, Percy Liang, Peter Chang, Peter Eckersley, Phu Mon Htut, Pinyu Hwang, Piotr Miłkowski, Piyush Patil, Pouya Pezeshkpour, Priti Oli, Qiaozhu Mei, Qing Lyu, Qinlang Chen, Rabin Banjade, Rachel Etta Rudolph, Raefer Gabriel, Rahel Habacker, Ramon Risco, Raphaël Millière, Rhythm Garg, Richard Barnes, Rif A. Saurous, Riku Arakawa, Robbe Raymaekers, Robert Frank, Rohan Sikand, Roman Novak, Roman Sitelew, Ronan LeBras, Rosanne Liu, Rowan Jacobs, Rui Zhang, Ruslan Salakhutdinov, Ryan Chi, Ryan Lee, Ryan Stovall, Ryan Teehan, Rylan Yang, Sahib Singh, Saif M. Mohammad, Sajant Anand, Sam Dillavou, Sam Shleifer, Sam Wiseman, Samuel Gruetter, Samuel R. Bowman, Samuel S. Schoenholz, Sanghyun Han, Sanjeev Kwatra, Sarah A. Rous, Sarik Ghazarian, Sayan Ghosh, Sean Casey, Sebastian Bischoff, Sebastian Gehrmann, Sebastian Schuster, Sepideh Sadeghi, Shadi Hamdan, Sharon Zhou, Shashank Srivastava, Sherry Shi, Shikhar Singh, Shima Asaadi, Shixiang Shane Gu, Shubh Pachchigar, Shubham Toshniwal, Shyam Upadhyay, Shyamolima, Debnath, Siamak Shakeri, Simon Thormeyer, Simone Melzi, Siva Reddy, Sneha Priscilla Makini, Soo-Hwan Lee, Spencer Torene, Sriharsha Hatwar, Stanislas Dehaene, Stefan Divic, Stefano Ermon, Stella Biderman, Stephanie Lin, Stephen Prasad, Steven T. Piantadosi, Stuart M. Shieber, Summer Misherghi, Svetlana Kiritchenko, Swaroop Mishra, Tal Linzen, Tal Schuster, Tao Li, Tao Yu, Tariq Ali, Tatsu Hashimoto, Te-Lin Wu, Théo Desbordes, Theodore Rothschild, Thomas Phan, Tianle Wang, Tiberius Nkinyili, Timo Schick, Timofei Kornev, Titus Tunduny, Tobias Gerstenberg, Trenton Chang, Trishala Neeraj, Tushar Khot, Tyler Shultz, Uri Shaham, Vedant Misra, Vera Demberg, Victoria Nyamai, Vikas Raunak, Vinay Ramasesh, Vinay Uday Prabhu, Vishakh Padmakumar, Vivek Srikumar, William Fedus, William Saunders, William Zhang, Wout Vossen, Xiang Ren, Xiaoyu Tong, Xinran Zhao, Xinyi Wu, Xudong Shen, Yadollah Yaghoobzadeh, Yair Lakretz, Yangqiu Song, Yasaman Bahri, Yejin Choi, Yichi Yang, Yiding Hao, Yifu Chen, Yonatan Belinkov, Yu Hou, Yufang Hou, Yuntao Bai, Zachary Seid, Zhuoye Zhao, Zijian Wang, Zijie J. Wang, ZiRui Wang, Ziyi Wu
BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models.
no code implementations • 6 Jun 2022 • Zhun Deng, Jiayao Zhang, Linjun Zhang, Ting Ye, Yates Coley, Weijie J. Su, James Zou
Specifically, FIFA encourages both classification and fairness generalization and can be flexibly combined with many existing fair learning methods with logits-based losses.
no code implementations • 31 May 2022 • Mert Yuksekgonul, Maggie Wang, James Zou
When concept annotations are not available on the training data, we show that PCBM can transfer concepts from other datasets or from natural language descriptions of concepts via multimodal models.
no code implementations • 11 May 2022 • Jaime Roquero Gimenez, James Zou
Developing deep generative models that flexibly incorporate diverse measures of probability distance is an important area of research.
no code implementations • 10 May 2022 • Prashnna K Gyawali, Yann Le Guen, Xiaoxia Liu, Hua Tang, James Zou, Zihuai He
This can lead to biases in the risk predictors resulting in poor generalization when applied to minority populations and admixed individuals such as African Americans.
2 code implementations • ICLR 2022 • Sabri Eyuboglu, Maya Varma, Khaled Saab, Jean-Benoit Delbrouck, Christopher Lee-Messer, Jared Dunnmon, James Zou, Christopher Ré
In this work, we address these challenges by first designing a principled evaluation framework that enables a quantitative comparison of SDMs across 1, 235 slice discovery settings in three input domains (natural images, medical images, and time-series data).
no code implementations • 15 Mar 2022 • Roxana Daneshjou, Kailas Vodrahalli, Roberto A Novoa, Melissa Jenkins, Weixin Liang, Veronica Rotemberg, Justin Ko, Susan M Swetter, Elizabeth E Bailey, Olivier Gevaert, Pritam Mukherjee, Michelle Phung, Kiana Yekrang, Bradley Fong, Rachna Sahasrabudhe, Johan A. C. Allerup, Utako Okata-Karigane, James Zou, Albert Chiou
To ascertain potential biases in algorithm performance in this context, we curated the Diverse Dermatology Images (DDI) dataset-the first publicly available, expertly curated, and pathologically confirmed image dataset with diverse skin tones.
2 code implementations • 3 Mar 2022 • Weixin Liang, Yuhui Zhang, Yongchan Kwon, Serena Yeung, James Zou
Our systematic analysis demonstrates that this gap is caused by a combination of model initialization and contrastive learning optimization.
1 code implementation • ICLR 2022 • Weixin Liang, James Zou
We present MetaShift--a collection of 12, 868 sets of natural images across 410 classes--to address this challenge.
1 code implementation • 12 Feb 2022 • Kailas Vodrahalli, Tobias Gerstenberg, James Zou
In this paper, we present an initial exploration that suggests showing AI models as more confident than they actually are, even when the original AI is well-calibrated, can improve human-AI performance (measured as the accuracy and confidence of the human's final prediction after seeing the AI advice).
no code implementations • 26 Jan 2022 • Yongchan Kwon, Antonio Ginart, James Zou
We introduce a new environment that allows ML predictors to use active learning algorithms to purchase labeled data within their budgets while competing against each other to attract users.
no code implementations • 4 Jan 2022 • Antonio Ginart, Laurens van der Maaten, James Zou, Chuan Guo
Recent data-extraction attacks have exposed that language models can memorize some training samples verbatim.
2 code implementations • 2 Jan 2022 • Huaxiu Yao, Yu Wang, Sai Li, Linjun Zhang, Weixin Liang, James Zou, Chelsea Finn
Machine learning algorithms typically assume that training and test examples are drawn from the same distribution.
no code implementations • 13 Dec 2021 • Zachary Izzo, James Zou, Lexing Ying
A recent line of work has focused on training machine learning (ML) models in the performative setting, i. e. when the data distribution reacts to the deployed model.
no code implementations • 15 Nov 2021 • Roxana Daneshjou, Kailas Vodrahalli, Weixin Liang, Roberto A Novoa, Melissa Jenkins, Veronica Rotemberg, Justin Ko, Susan M Swetter, Elizabeth E Bailey, Olivier Gevaert, Pritam Mukherjee, Michelle Phung, Kiana Yekrang, Bradley Fong, Rachna Sahasrabudhe, James Zou, Albert Chiou
AI diagnostic tools may aid in early skin cancer detection; however most models have not been assessed on images of diverse skin tones or uncommon diseases.
no code implementations • 12 Nov 2021 • Eric Wu, Kevin Wu, James Zou
Medical AI algorithms can often experience degraded performance when evaluated on previously unseen sites.
no code implementations • 10 Nov 2021 • Amirata Ghorbani, Dina Berenbaum, Maor Ivgi, Yuval Dafna, James Zou
We address this limitation by introducing Feature Vectors, a new global interpretability method designed for tabular datasets.
2 code implementations • 26 Oct 2021 • Yongchan Kwon, James Zou
Data Shapley has recently been proposed as a principled framework to quantify the contribution of individual datum in machine learning.
no code implementations • 13 Oct 2021 • Bryan He, Matthew Thomson, Meena Subramaniam, Richard Perez, Chun Jimmie Ye, James Zou
Predicting phenotype from scRNA-seq is challenging for standard machine learning methods -- the number of cells measured can vary by orders of magnitude across individuals and the cell populations are also highly heterogeneous.
1 code implementation • CVPR 2022 • Tarek Naous, Srinjay Sarkar, Abubakar Abid, James Zou
We describe the method and compare it to ten other clustering methods on synthetic data to illustrate its advantages and disadvantages.
no code implementations • 6 Oct 2021 • Wenlong Ji, Zhun Deng, Ryumei Nakada, James Zou, Linjun Zhang
Contrastive learning has achieved state-of-the-art performance in various self-supervised learning tasks and even outperforms its supervised counterpart.
no code implementations • ICLR 2022 • Lingjiao Chen, Matei Zaharia, James Zou
ML prediction APIs from providers like Amazon and Google have made it simple to use ML in applications.
no code implementations • NeurIPS Workshop ICBINB 2021 • Yuhui Zhang, Hao Ding, Zeren Shui, Yifei Ma, James Zou, Anoop Deoras, Hao Wang
Pre-trained language models (PLMs) such as BERT and GPT learn general text representations and encode extensive world knowledge; thus, they can be efficiently and accurately adapted to various downstream tasks.
no code implementations • 29 Jul 2021 • Lingjiao Chen, Tracy Cai, Matei Zaharia, James Zou
This motivated us to formulate the API shift assessment problem at a more fine-grained level as estimating how the API model's confusion matrix changes over time when the data distribution is constant.
1 code implementation • 14 Jul 2021 • Kailas Vodrahalli, Roxana Daneshjou, Tobias Gerstenberg, James Zou
In decision support applications of AI, the AI algorithm's output is framed as a suggestion to a human user.
1 code implementation • 24 Jun 2021 • Abubakar Abid, Mert Yuksekgonul, James Zou
Understanding and explaining the mistakes made by trained models is critical to many machine learning objectives, such as improving robustness, addressing concept drift, and mitigating biases.
no code implementations • 18 Jun 2021 • Farzan Farnia, Amirali Aghazadeh, James Zou, David Tse
Robust training methods against perturbations to the input data have received great attention in the machine learning literature.
no code implementations • NeurIPS 2021 • Zhun Deng, Linjun Zhang, Kailas Vodrahalli, Kenji Kawaguchi, James Zou
Recent works empirically demonstrate that adversarial training in the source data can improve the ability of models to transfer to new domains.
no code implementations • 28 Apr 2021 • Antonio Ginart, Martin Zhang, James Zou
Post-deployment monitoring of ML systems is critical for ensuring reliability, especially as new user inputs can differ from the training distribution.
no code implementations • 16 Apr 2021 • Amirata Ghorbani, James Zou, Andre Esteva
In this work, we introduce Active Data Shapley (ADS) -- a filtering layer for batch active learning that significantly increases the efficiency of active learning by pre-selecting, using a linear time computation, the highest-value points from an unlabeled dataset.
no code implementations • 18 Feb 2021 • Lingjiao Chen, Matei Zaharia, James Zou
In this work, we propose FrugalMCT, a principled framework that adaptively selects the APIs to use for different data in an online fashion while respecting user's budget.
1 code implementation • 15 Feb 2021 • Zachary Izzo, Lexing Ying, James Zou
Performative distribution shift captures the setting where the choice of which ML model is deployed changes the data distribution.
no code implementations • 11 Feb 2021 • Linjun Zhang, Zhun Deng, Kenji Kawaguchi, James Zou
In addition, we study how Mixup improves calibration in semi-supervised learning.
1 code implementation • 14 Jan 2021 • Abubakar Abid, Maheen Farooqi, James Zou
It has been observed that large-scale language models capture undesirable societal biases, e. g. relating to race and gender; yet religious bias has been relatively unexplored.
1 code implementation • 21 Nov 2020 • Weixin Liang, James Zou
A key challenge of neural group testing is to modify a deep neural network so that it could test multiple samples in one forward pass.
no code implementations • 15 Oct 2020 • Siyi Tang, Amirata Ghorbani, Rikiya Yamashita, Sameer Rehman, Jared A. Dunnmon, James Zou, Daniel L. Rubin
In this study, we used data Shapley, a data valuation metric, to quantify the value of training data to the performance of a pneumonia detection algorithm in a large chest X-ray dataset.
no code implementations • ICLR 2021 • Linjun Zhang, Zhun Deng, Kenji Kawaguchi, Amirata Ghorbani, James Zou
For robustness, we show that minimizing the Mixup loss corresponds to approximately minimizing an upper bound of the adversarial loss.
no code implementations • 1 Oct 2020 • Kailas Vodrahalli, Roxana Daneshjou, Roberto A Novoa, Albert Chiou, Justin M Ko, James Zou
These promising results suggest that our solution is feasible and can improve the quality of teledermatology care.
no code implementations • EMNLP 2020 • Weixin Liang, James Zou, Zhou Yu
We propose Active Learning with Contrastive Explanations (ALICE), an expert-in-the-loop training framework that utilizes contrastive natural language explanations to improve data efficiency in learning.
no code implementations • 15 Sep 2020 • Antonio Ginart, Eva Zhang, Yongchan Kwon, James Zou
A service that is more often queried by users, perhaps because it more accurately anticipates user preferences, is also more likely to obtain additional user data (e. g. in the form of a Yelp review).
1 code implementation • 26 Jul 2020 • Huaxiu Yao, Long-Kai Huang, Linjun Zhang, Ying WEI, Li Tian, James Zou, Junzhou Huang, Zhenhui Li
Moreover, both MetaMix and Channel Shuffle outperform state-of-the-art results by a large margin across many datasets and are compatible with existing meta-learning algorithms.
no code implementations • 2 Jul 2020 • Yongchan Kwon, Manuel A. Rivas, James Zou
Distributional data Shapley value (DShapley) has recently been proposed as a principled framework to quantify the contribution of individual datum in machine learning.
no code implementations • 15 Jun 2020 • Zhun Deng, Linjun Zhang, Amirata Ghorbani, James Zou
In this work, we investigate how adversarial robustness can be enhanced by leveraging out-of-domain unlabeled data.
no code implementations • NeurIPS 2020 • Lingjiao Chen, Matei Zaharia, James Zou
Prediction APIs offered for a fee are a fast-growing industry and an important part of machine learning as a service.
6 code implementations • NeurIPS 2020 • Tianhe Yu, Garrett Thomas, Lantao Yu, Stefano Ermon, James Zou, Sergey Levine, Chelsea Finn, Tengyu Ma
We also characterize the trade-off between the gain and risk of leaving the support of the batch data.
1 code implementation • ACL 2020 • Weixin Liang, James Zou, Zhou Yu
Our experiments show that CMADE achieves 89. 2% accuracy in the dialog comparison task.
no code implementations • 8 Mar 2020 • Abubakar Abid, James Zou
Systematic experiments on image segmentation and text tagging demonstrate the strong performance of ECN in improving training on noisy structured labels.
no code implementations • ICML 2020 • Amirata Ghorbani, Michael P. Kim, James Zou
Shapley value is a classic notion from game theory, historically used to quantify the contributions of individuals within groups, and more recently applied to assign values to data points when training machine learning models.
no code implementations • 24 Feb 2020 • Zachary Izzo, Mary Anne Smart, Kamalika Chaudhuri, James Zou
Deleting data from a trained machine learning (ML) model is a critical task in many applications.
1 code implementation • NeurIPS 2020 • Amirata Ghorbani, James Zou
We develop Neuron Shapley as a new framework to quantify the contribution of individual neurons to the prediction and performance of a deep network.
no code implementations • 9 Oct 2019 • Gal Yona, Amirata Ghorbani, James Zou
We propose Extended Shapley as a principled framework for this problem, and experiment empirically with how it can be used to address questions of ML accountability.
no code implementations • ICLR 2020 • Ruishan Liu, Akshay Balsubramani, James Zou
Optimal transport (OT) is a principled approach to align datasets, but a key challenge in applying OT is that we need to specify a transport cost function that accurately captures how the two datasets are related.
6 code implementations • 25 Sep 2019 • Antonio Ginart, Maxim Naumov, Dheevatsa Mudigere, Jiyan Yang, James Zou
Embedding representations power machine intelligence in many applications, including recommendation systems, but they are space intensive -- potentially occupying hundreds of gigabytes in large-scale settings.
1 code implementation • 24 Sep 2019 • Allen Nie, Arturo L. Pineda, Matt W. Wright Hannah Wand, Bryan Wulf, Helio A. Costa, Ronak Y. Patel, Carlos D. Bustamante, James Zou
In collaboration with the Clinical Genomic Resource (ClinGen)---the flagship NIH program for clinical curation---we propose the first machine learning system, LitGen, that can retrieve papers for a particular variant and filter them by specific evidence types used by curators to assess for pathogenicity.
4 code implementations • NeurIPS 2019 • Antonio Ginart, Melody Y. Guan, Gregory Valiant, James Zou
Intense recent discussions have focused on how to provide individuals with control over when their data can and cannot be used --- the EU's Right To Be Forgotten regulation is an example of this effort.
1 code implementation • 6 Jun 2019 • Abubakar Abid, Ali Abdalla, Ali Abid, Dawood Khan, Abdulrahman Alfozan, James Zou
Their feedback identified that Gradio should support a variety of interfaces and frameworks, allow for easy sharing of the interface, allow for input manipulation and interactive inference by the domain expert, as well as allow embedding the interface in iPython notebooks.
no code implementations • 29 May 2019 • Jaime Roquero Gimenez, James Zou
Most of the work in this domain has focused on identifying globally relevant features, which are features that are related to the outcome using evidence across the entire dataset.
no code implementations • 27 May 2019 • Maulik R. Kamdar, Tymor Hamamsy, Shea Shelton, Ayin Vala, Tome Eftimov, James Zou, Suzanne Tamang
Statistical learning methods that use data from multiple clinical centers across the US to detect opioid over-prescribing trends and predict possible opioid misuse are required.
5 code implementations • 5 Apr 2019 • Amirata Ghorbani, James Zou
As data becomes the fuel driving technological and economic growth, a fundamental challenge is how to quantify the value of data in algorithmic predictions and decisions.
1 code implementation • NAACL 2019 • Dorottya Demszky, Nikhil Garg, Rob Voigt, James Zou, Matthew Gentzkow, Jesse Shapiro, Dan Jurafsky
We provide an NLP framework to uncover four linguistic dimensions of political polarization in social media: topic choice, framing, affect and illocutionary force.
1 code implementation • 12 Feb 2019 • Abubakar Abid, James Zou
The cVAE explicitly models latent features that are shared between the datasets, as well as those that are enriched in one dataset relative to the other, which allows the algorithm to isolate and enhance the salient latent features.
2 code implementations • NeurIPS 2019 • Amirata Ghorbani, James Wexler, James Zou, Been Kim
Interpretability has become an important topic of research as more machine learning (ML) models are deployed and widely used to make important decisions.
2 code implementations • 27 Jan 2019 • Abubakar Abid, Muhammad Fatih Balin, James Zou
We introduce the concrete autoencoder, an end-to-end differentiable method for global feature selection, which efficiently identifies a subset of the most informative features and simultaneously learns a neural network to reconstruct the input data from the selected features.
Ranked #1 on General Classification on Fashion-MNIST
no code implementations • 29 Nov 2018 • Yuhui Zhang, Allen Nie, James Zou
We compare the performance of our model with several baselines in a challenging cross-hospital setting with substantial domain shift.
1 code implementation • 1 Nov 2018 • Bryan He, James Zou
In classification, the de facto method for aggregating individual losses is the average loss.
no code implementations • 31 Oct 2018 • Abdi-Hakin Dirie, Abubakar Abid, James Zou
We introduce Contrastive Multivariate Singular Spectrum Analysis, a novel unsupervised method for dimensionality reduction and signal decomposition of time series data.
no code implementations • 26 Oct 2018 • Jaime Roquero Gimenez, James Zou
The Model-X knockoff procedure has recently emerged as a powerful approach for feature selection with statistical guarantees.
no code implementations • NeurIPS 2018 • Abubakar Abid, James Zou
We define a flexible and differentiable family of warping metrics, which encompasses common metrics such as DTW, Euclidean, and edit distance.
no code implementations • 17 Jul 2018 • Jaime Roquero Gimenez, Amirata Ghorbani, James Zou
This is often impossible to do from purely observational data, and a natural relaxation is to identify features that are correlated with the outcome even conditioned on all other observed features.
1 code implementation • 28 Jun 2018 • Allen Nie, Ashley Zehnder, Rodney L. Page, Arturo L. Pineda, Manuel A. Rivas, Carlos D. Bustamante, James Zou
However, clinicians lack the time and resource to annotate patient records with standard medical diagnostic codes and most veterinary visits are captured in free text notes.
1 code implementation • 31 May 2018 • Michael P. Kim, Amirata Ghorbani, James Zou
Prediction systems are successfully deployed in applications ranging from disease diagnosis, to predicting credit worthiness, to image recognition.
no code implementations • 5 Apr 2018 • Anvita Gupta, James Zou
We propose a novel feedback-loop architecture, called Feedback GAN (FBGAN), to optimize the synthetic gene sequences for desired properties using an external function analyzer.
no code implementations • 2 Apr 2018 • Abubakar Abid, James Zou
We consider the problem of inference in a linear regression model in which the relative ordering of the input features and output labels is not known.
1 code implementation • ICML 2018 • Kevin Tian, Teng Zhang, James Zou
However, in addition to the text data itself, we often have additional covariates associated with individual corpus documents---e. g. the demographic of the author, time and venue of publication---and we would like the embedding to naturally capture this information.
no code implementations • ICLR 2018 • Amirata Ghorbani, Abubakar Abid, James Zou
In this paper, we show that interpretation of deep learning predictions is extremely fragile in the following sense: two perceptively indistinguishable inputs with the same predicted label can be assigned very different}interpretations.
no code implementations • ICLR 2018 • Allen Nie, Mihir Mongia, James Zou
Recently, a regularization method has been proposed to optimize the variational lower bound of the Information Bottleneck Lagrangian.
no code implementations • ICLR 2018 • Kevin Tian, Teng Zhang, James Zou
In addition to the text data itself, we often have additional covariates associated with individual documents in the corpus---e. g. the demographic of the author, time and venue of publication, etc.---and we would like the embedding to naturally capture the information of the covariates.
1 code implementation • 22 Nov 2017 • Nikhil Garg, Londa Schiebinger, Dan Jurafsky, James Zou
Word embeddings use vectors to represent words such that the geometry between vectors captures semantic relationship between the words.
1 code implementation • NeurIPS 2017 • Fei Xia, Martin J. Zhang, James Zou, David Tse
For example, in genetic association studies, each hypothesis tests the correlation between a variant and the trait.
2 code implementations • 29 Oct 2017 • Amirata Ghorbani, Abubakar Abid, James Zou
In this paper, we show that interpretation of deep learning predictions is extremely fragile in the following sense: two perceptively indistinguishable inputs with the same predicted label can be assigned very different interpretations.
1 code implementation • 18 Oct 2017 • Ruishan Liu, James Zou
We show that even in this very simple setting, the amount of memory kept can substantially affect the agent's performance.
1 code implementation • 20 Sep 2017 • Abubakar Abid, Martin J. Zhang, Vivek K. Bagaria, James Zou
We present a new technique called contrastive principal component analysis (cPCA) that is designed to discover low-dimensional structure that is unique to a dataset, or enriched in one dataset relative to other data.
no code implementations • 7 Aug 2017 • Xinkun Nie, Xiaoying Tian, Jonathan Taylor, James Zou
In this paper, we prove that when the data collection procedure satisfies natural conditions, then sample means of the data have systematic \emph{negative} biases.
no code implementations • ICML 2017 • Pengtao Xie, Yuntian Deng, Yi Zhou, Abhimanu Kumar, Yao-Liang Yu, James Zou, Eric P. Xing
The large model capacity of latent space models (LSMs) enables them to achieve great performance on various applications, but meanwhile renders LSMs to be prone to overfitting.
2 code implementations • ICML 2017 • Aditi Raghunathan, Greg Valiant, James Zou
We generalize this extrapolation and related unseen estimation problems to the multiple population setting, where population $j$ has an unknown distribution $D_j$ from which we observe $n_j$ samples.
no code implementations • WS 2017 • Shyam Upadhyay, Kai-Wei Chang, Matt Taddy, Adam Kalai, James Zou
We present a multi-view Bayesian non-parametric algorithm which improves multi-sense word embeddings by (a) using multilingual (i. e., more than two languages) corpora to significantly improve sense embeddings beyond what one achieves with bilingual information, and (b) uses a principled approach to learn a variable number of senses per word, in a data-driven manner.
no code implementations • 3 May 2017 • Abubakar Abid, Ada Poon, James Zou
We study the regimes in which each estimator excels, and generalize the estimators to the setting where partial ordering information is available in the form of experiments replicated independently.
8 code implementations • NeurIPS 2016 • Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, Adam Kalai
Geometrically, gender bias is first shown to be captured by a direction in the word embedding.
no code implementations • 20 Jun 2016 • Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, Adam Kalai
Machine learning algorithms are optimized to model statistical properties of the training data.
no code implementations • 19 Jun 2016 • Akash Srivastava, James Zou, Ryan P. Adams, Charles Sutton
A good clustering can help a data analyst to explore and understand a data set, but what constitutes a good clustering may depend on domain-specific and application-specific criteria.
no code implementations • 20 May 2016 • Jonathan H. Huggins, James Zou
As an illustration, we apply our framework to derive finite-sample error bounds of approximate unadjusted Langevin dynamics.
no code implementations • 22 Feb 2016 • Akash Srivastava, James Zou, Charles Sutton
A good clustering can help a data analyst to explore and understand a data set, but what constitutes a good clustering may depend on domain-specific and application-specific criteria.
no code implementations • 16 Nov 2015 • Daniel Russo, James Zou
But while %the adaptive nature of exploration any data-exploration renders standard statistical theory invalid, experience suggests that different types of exploratory analysis can lead to disparate levels of bias, and the degree of bias also depends on the particulars of the data set.
no code implementations • 14 Jul 2015 • Rong Ge, James Zou
In this paper, we develop the general framework of Rich Component Analysis (RCA) to model settings where the observations from different views are driven by different sets of latent components, and each component can be a complex, high-dimensional distribution.
no code implementations • 8 Jul 2015 • Rong Ge, James Zou
A plethora of algorithms have been developed to tackle NMF, but due to the non-convex nature of the problem, there is little guarantee on how well these methods work.