Search Results for author: Besmira Nushi

Found 23 papers, 7 papers with code

Introducing v0.5 of the AI Safety Benchmark from MLCommons

1 code implementation • 18 Apr 2024 • Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller, Ram Gandikota, Agasthya Gangavarapu, Ananya Gangavarapu, James Gealy, Rajat Ghosh, James Goel, Usman Gohar, Sujata Goswami, Scott A. Hale, Wiebke Hutiri, Joseph Marvin Imperial, Surgan Jandial, Nick Judd, Felix Juefei-Xu, Foutse khomh, Bhavya Kailkhura, Hannah Rose Kirk, Kevin Klyman, Chris Knotz, Michael Kuchnik, Shachi H. Kumar, Chris Lengerich, Bo Li, Zeyi Liao, Eileen Peters Long, Victor Lu, Yifan Mai, Priyanka Mary Mammen, Kelvin Manyeki, Sean McGregor, Virendra Mehta, Shafee Mohammed, Emanuel Moss, Lama Nachman, Dinesh Jinenhally Naganna, Amin Nikanjam, Besmira Nushi, Luis Oala, Iftach Orr, Alicia Parrish, Cigdem Patlak, William Pietri, Forough Poursabzi-Sangdeh, Eleonora Presani, Fabrizio Puletti, Paul Röttger, Saurav Sahay, Tim Santos, Nino Scherrer, Alice Schoenauer Sebag, Patrick Schramowski, Abolfazl Shahbazi, Vin Sharma, Xudong Shen, Vamsi Sistla, Leonard Tang, Davide Testuggine, Vithursan Thangarasa, Elizabeth Anne Watkins, Rebecca Weiss, Chris Welty, Tyler Wilbers, Adina Williams, Carole-Jean Wu, Poonam Yadav, Xianjun Yang, Yi Zeng, Wenhui Zhang, Fedor Zhdanov, Jiacheng Zhu, Percy Liang, Peter Mattson, Joaquin Vanschoren

We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0. 5 benchmark.

Paper
Code

Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models

2 code implementations • 9 Apr 2024 • Sebastian Bordt, Harsha Nori, Vanessa Rodrigues, Besmira Nushi, Rich Caruana

We then compare the few-shot learning performance of LLMs on datasets that were seen during training to the performance on datasets released after training.

Few-Shot Learning Language Modelling +2

181

Paper
Code

KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval

1 code implementation • 24 Oct 2023 • Marah I Abdin, Suriya Gunasekar, Varun Chandrasekaran, Jerry Li, Mert Yuksekgonul, Rahee Ghosh Peshawaria, Ranjita Naik, Besmira Nushi

Motivated by rising concerns around factual incorrectness and hallucinations of LLMs, we present KITAB, a new dataset for measuring constraint satisfaction abilities of language models.

Information Retrieval Retrieval

Paper
Code

Diversity of Thought Improves Reasoning Abilities of LLMs

no code implementations • 11 Oct 2023 • Ranjita Naik, Varun Chandrasekaran, Mert Yuksekgonul, Hamid Palangi, Besmira Nushi

Large language models (LLMs) are documented to struggle in settings that require complex reasoning.

Paper
Add Code

Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models

1 code implementation • 26 Sep 2023 • Mert Yuksekgonul, Varun Chandrasekaran, Erik Jones, Suriya Gunasekar, Ranjita Naik, Hamid Palangi, Ece Kamar, Besmira Nushi

We investigate the internal behavior of Transformer-based Large Language Models (LLMs) when they generate factually incorrect text.

Paper
Code

Mitigating Spurious Correlations in Multi-modal Models during Fine-tuning

no code implementations • 8 Apr 2023 • Yu Yang, Besmira Nushi, Hamid Palangi, Baharan Mirzasoleiman

Spurious correlations that degrade model generalization or lead the model to be right for the wrong reasons are one of the main robustness concerns for real-world deployments.

Attribute

Paper
Add Code

Social Biases through the Text-to-Image Generation Lens

no code implementations • 30 Mar 2023 • Ranjita Naik, Besmira Nushi

In this paper, we take a multi-dimensional approach to studying and quantifying common social biases as reflected in the generated images, by focusing on how occupations, personality traits, and everyday situations are depicted across representations of (perceived) gender, age, race, and geographical location.

Descriptive Text-to-Image Generation

Paper
Add Code

Benchmarking Spatial Relationships in Text-to-Image Generation

1 code implementation • 20 Dec 2022 • Tejas Gokhale, Hamid Palangi, Besmira Nushi, Vibhav Vineet, Eric Horvitz, Ece Kamar, Chitta Baral, Yezhou Yang

We investigate the ability of T2I models to generate correct spatial relationships among objects and present VISOR, an evaluation metric that captures how accurately the spatial relationship described in text is generated in the image.

Benchmarking Text-to-Image Generation

Paper
Code

Advancing Human-AI Complementarity: The Impact of User Expertise and Algorithmic Tuning on Joint Decision Making

no code implementations • 16 Aug 2022 • Kori Inkpen, Shreya Chappidi, Keri Mallari, Besmira Nushi, Divya Ramesh, Pietro Michelucci, Vani Mandava, Libuše Hannah Vepřek, Gabrielle Quinn

In addition, we found that users' perception of the AI's performance relative on their own also had a significant impact on whether their accuracy improved when given AI recommendations.

Decision Making

Paper
Add Code

Who Goes First? Influences of Human-AI Workflow on Decision Making in Clinical Imaging

no code implementations • 19 May 2022 • Riccardo Fogliato, Shreya Chappidi, Matthew Lungren, Michael Fitzke, Mark Parkinson, Diane Wilson, Paul Fisher, Eric Horvitz, Kori Inkpen, Besmira Nushi

A critical aspect of interaction design for AI-assisted human decision making are policies about the display and sequencing of AI inferences within larger decision-making workflows.

Decision Making

Paper
Add Code

Investigations of Performance and Bias in Human-AI Teamwork in Hiring

no code implementations • 21 Feb 2022 • Andi Peng, Besmira Nushi, Emre Kiciman, Kori Inkpen, Ece Kamar

In AI-assisted decision-making, effective hybrid (human-AI) teamwork is not solely dependent on AI performance alone, but also on its impact on human decision-making.

Decision Making

Paper
Add Code

Hierarchical Analysis of Visual COVID-19 Features from Chest Radiographs

1 code implementation • 14 Jul 2021 • Shruthi Bannur, Ozan Oktay, Melanie Bernhardt, Anton Schwaighofer, Rajesh Jena, Besmira Nushi, Sharan Wadhwani, Aditya Nori, Kal Natarajan, Shazad Ashraf, Javier Alvarez-Valle, Daniel C. Castro

Chest radiography has been a recommended procedure for patient triaging and resource management in intensive care units (ICUs) throughout the COVID-19 pandemic.

Management

538

Paper
Code

Understanding Failures of Deep Networks via Robust Feature Extraction

1 code implementation • CVPR 2021 • Sahil Singla, Besmira Nushi, Shital Shah, Ece Kamar, Eric Horvitz

Traditional evaluation metrics for learned models that report aggregate scores over a test set are insufficient for surfacing important and informative patterns of failure over features and instances.

Paper
Code

An Empirical Analysis of Backward Compatibility in Machine Learning Systems

no code implementations • 11 Aug 2020 • Megha Srivastava, Besmira Nushi, Ece Kamar, Shital Shah, Eric Horvitz

In many applications of machine learning (ML), updates are performed with the goal of enhancing model performance.

BIG-bench Machine Learning

Paper
Add Code

Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance

no code implementations • 26 Jun 2020 • Gagan Bansal, Tongshuang Wu, Joyce Zhou, Raymond Fok, Besmira Nushi, Ece Kamar, Marco Tulio Ribeiro, Daniel S. Weld

However, prior studies observed improvements from explanations only when the AI, alone, outperformed both the human and the best team.

Decision Making Question Answering +1

Paper
Add Code

Is the Most Accurate AI the Best Teammate? Optimizing AI for Teamwork

no code implementations • 27 Apr 2020 • Gagan Bansal, Besmira Nushi, Ece Kamar, Eric Horvitz, Daniel S. Weld

To optimize the team performance for this setting we maximize the team's expected utility, expressed in terms of the quality of the final decision, cost of verifying, and individual accuracies of people and machines.

Decision Making

Paper
Add Code

SQuINTing at VQA Models: Introspecting VQA Models with Sub-Questions

no code implementations • CVPR 2020 • Ramprasaath R. Selvaraju, Purva Tendulkar, Devi Parikh, Eric Horvitz, Marco Ribeiro, Besmira Nushi, Ece Kamar

We quantify the extent to which this phenomenon occurs by creating a new Reasoning split of the VQA dataset and collecting VQA-introspect, a new dataset1 which consists of 238K new perception questions which serve as sub questions corresponding to the set of perceptual tasks needed to effectively answer the complex reasoning questions in the Reasoning split.

Visual Question Answering (VQA)

Paper
Add Code

What You See Is What You Get? The Impact of Representation Criteria on Human Bias in Hiring

no code implementations • 8 Sep 2019 • Andi Peng, Besmira Nushi, Emre Kiciman, Kori Inkpen, Siddharth Suri, Ece Kamar

Although systematic biases in decision-making are widely documented, the ways in which they emerge from different sources is less understood.

Decision Making

Paper
Add Code

A Case for Backward Compatibility for Human-AI Teams

no code implementations • 4 Jun 2019 • Gagan Bansal, Besmira Nushi, Ece Kamar, Dan Weld, Walter Lasecki, Eric Horvitz

We introduce the notion of the compatibility of an AI update with prior user experience and present methods for studying the role of compatibility in human-AI teams.

Decision Making

Paper
Add Code

Metareasoning in Modular Software Systems: On-the-Fly Configuration using Reinforcement Learning with Rich Contextual Representations

no code implementations • 12 May 2019 • Aditya Modi, Debadeepta Dey, Alekh Agarwal, Adith Swaminathan, Besmira Nushi, Sean Andrist, Eric Horvitz

We address the opportunity to maximize the utility of an overall computing system by employing reinforcement learning to guide the configuration of the set of interacting modules that comprise the system.

Decision Making reinforcement-learning +1

Paper
Add Code

Towards Accountable AI: Hybrid Human-Machine Analyses for Characterizing System Failure

no code implementations • 19 Sep 2018 • Besmira Nushi, Ece Kamar, Eric Horvitz

We present Pandora, a set of hybrid human-machine methods and tools for describing and explaining system failures.

BIG-bench Machine Learning Image Captioning

Paper
Add Code

On Human Intellect and Machine Failures: Troubleshooting Integrative Machine Learning Systems

no code implementations • 24 Nov 2016 • Besmira Nushi, Ece Kamar, Eric Horvitz, Donald Kossmann

We study the problem of troubleshooting machine learning systems that rely on analytical pipelines of distinct components.

BIG-bench Machine Learning Image Captioning

Paper
Add Code

Crowd Access Path Optimization: Diversity Matters

no code implementations • 8 Aug 2015 • Besmira Nushi, Adish Singla, Anja Gruenheid, Erfan Zamanian, Andreas Krause, Donald Kossmann

Based on this intuitive idea, we introduce the Access Path Model (APM), a novel crowd model that leverages the notion of access paths as an alternative way of retrieving information.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.