1 code implementation • 3 Apr 2024 • Hussein Mozannar, Valerie Chen, Mohammed Alsobay, Subhro Das, Sebastian Zhao, Dennis Wei, Manish Nagireddy, Prasanna Sattigeri, Ameet Talwalkar, David Sontag
Evaluation of large language models (LLMs) for code has primarily relied on static benchmarks, including HumanEval (Chen et al., 2021), which measure the ability of LLMs to generate complete code that passes unit tests.
1 code implementation • 7 Nov 2023 • Lindia Tjuatja, Valerie Chen, Sherry Tongshuang Wu, Ameet Talwalkar, Graham Neubig
As large language models (LLMs) become more capable, there is growing excitement about the possibility of using LLMs as proxies for humans in real-world tasks where subjective labels are desired, such as in surveys and opinion polling.
no code implementations • 25 Aug 2023 • Giang Nguyen, Valerie Chen, Mohammad Reza Taesiri, Anh Totti Nguyen
Nearest neighbors (NN) are traditionally used to compute final decisions, e. g., in Support Vector Machines or k-NN classifiers, and to provide users with explanations for the model's decision.
no code implementations • 28 Jul 2023 • Matthew Barker, Emma Kallina, Dhananjay Ashok, Katherine M. Collins, Ashley Casovan, Adrian Weller, Ameet Talwalkar, Valerie Chen, Umang Bhatt
We propose FeedbackLogs, addenda to existing documentation of ML pipelines, to track the input of multiple stakeholders.
no code implementations • 13 Apr 2023 • Umang Bhatt, Valerie Chen, Katherine M. Collins, Parameswaran Kamalaruban, Emma Kallina, Adrian Weller, Ameet Talwalkar
In this work, we propose learning a decision support policy that, for a given input, chooses which form of support, if any, to provide.
1 code implementation • 16 Feb 2023 • Joon Sik Kim, Valerie Chen, Danish Pruthi, Nihar B. Shah, Ameet Talwalkar
Many practical applications, ranging from paper-reviewer assignment in peer review to job-applicant matching for hiring, require human decision makers to identify relevant matches by combining their expertise with predictions from machine learning models.
no code implementations • 15 Feb 2023 • Ada Martin, Valerie Chen, Sérgio Jesus, Pedro Saleiro
We hope that this work motivates further study of when and how SimEvals should be used to aid in the design of real-world evaluations.
no code implementations • 18 Jan 2023 • Valerie Chen, Q. Vera Liao, Jennifer Wortman Vaughan, Gagan Bansal
AI explanations are often mentioned as a way to improve human-AI decision-making, but empirical studies have not found consistent evidence of explanations' effectiveness and, on the contrary, suggest that they can increase overreliance when the AI system is wrong.
no code implementations • 24 Jun 2022 • Kasun Amarasinghe, Kit T. Rodolfa, Sérgio Jesus, Valerie Chen, Vladimir Balayan, Pedro Saleiro, Pedro Bizarro, Ameet Talwalkar, Rayid Ghani
Most existing evaluations of explainable machine learning (ML) methods rely on simplifying assumptions or proxies that do not reflect real-world use cases; the handful of more robust evaluations on real-world settings have shortcomings in their design, resulting in limited conclusions of methods' real-world utility.
no code implementations • 5 Jun 2022 • Valerie Chen, Nari Johnson, Nicholay Topin, Gregory Plumb, Ameet Talwalkar
SimEvals involve training algorithmic agents that take as input the information content (such as model explanations) that would be presented to each participant in a human subject study, to predict answers to the use case of interest.
no code implementations • 13 May 2022 • Valerie Chen, Umang Bhatt, Hoda Heidari, Adrian Weller, Ameet Talwalkar
A practitioner may receive feedback from an expert at the observation- or domain-level, and convert this feedback into updates to the dataset, loss function, or parameter space.
no code implementations • 12 Dec 2021 • Keegan Harris, Valerie Chen, Joon Sik Kim, Ameet Talwalkar, Hoda Heidari, Zhiwei Steven Wu
While the decision maker's problem of finding the optimal Bayesian incentive-compatible (BIC) signaling policy takes the form of optimization over infinitely-many variables, we show that this optimization can be cast as a linear program over finitely-many regions of the space of possible assessment rules.
no code implementations • 10 Mar 2021 • Valerie Chen, Jeffrey Li, Joon Sik Kim, Gregory Plumb, Ameet Talwalkar
Despite increasing interest in the field of Interpretable Machine Learning (IML), a significant gap persists between the technical objectives targeted by researchers' methods and the high-level goals of consumers' use cases.
1 code implementation • ICLR 2021 • Valerie Chen, Abhinav Gupta, Kenneth Marino
We also find that incorporating natural language allows the model to generalize to unseen tasks in a zero-shot setting and to learn quickly from a few demonstrations.
no code implementations • 9 Jun 2019 • Valerie Chen, Man-Ki Yoon, Zhong Shao
Machine-learning driven safety-critical autonomous systems, such as self-driving cars, must be able to detect situations where its trained model is not able to make a trustworthy prediction.