Search Results for author: Besmira Nushi

Found 23 papers, 7 papers with code

Introducing v0.5 of the AI Safety Benchmark from MLCommons

1 code implementation18 Apr 2024 Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller, Ram Gandikota, Agasthya Gangavarapu, Ananya Gangavarapu, James Gealy, Rajat Ghosh, James Goel, Usman Gohar, Sujata Goswami, Scott A. Hale, Wiebke Hutiri, Joseph Marvin Imperial, Surgan Jandial, Nick Judd, Felix Juefei-Xu, Foutse khomh, Bhavya Kailkhura, Hannah Rose Kirk, Kevin Klyman, Chris Knotz, Michael Kuchnik, Shachi H. Kumar, Chris Lengerich, Bo Li, Zeyi Liao, Eileen Peters Long, Victor Lu, Yifan Mai, Priyanka Mary Mammen, Kelvin Manyeki, Sean McGregor, Virendra Mehta, Shafee Mohammed, Emanuel Moss, Lama Nachman, Dinesh Jinenhally Naganna, Amin Nikanjam, Besmira Nushi, Luis Oala, Iftach Orr, Alicia Parrish, Cigdem Patlak, William Pietri, Forough Poursabzi-Sangdeh, Eleonora Presani, Fabrizio Puletti, Paul Röttger, Saurav Sahay, Tim Santos, Nino Scherrer, Alice Schoenauer Sebag, Patrick Schramowski, Abolfazl Shahbazi, Vin Sharma, Xudong Shen, Vamsi Sistla, Leonard Tang, Davide Testuggine, Vithursan Thangarasa, Elizabeth Anne Watkins, Rebecca Weiss, Chris Welty, Tyler Wilbers, Adina Williams, Carole-Jean Wu, Poonam Yadav, Xianjun Yang, Yi Zeng, Wenhui Zhang, Fedor Zhdanov, Jiacheng Zhu, Percy Liang, Peter Mattson, Joaquin Vanschoren

We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0. 5 benchmark.

Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models

2 code implementations9 Apr 2024 Sebastian Bordt, Harsha Nori, Vanessa Rodrigues, Besmira Nushi, Rich Caruana

We then compare the few-shot learning performance of LLMs on datasets that were seen during training to the performance on datasets released after training.

Few-Shot Learning Language Modelling +2

KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval

1 code implementation24 Oct 2023 Marah I Abdin, Suriya Gunasekar, Varun Chandrasekaran, Jerry Li, Mert Yuksekgonul, Rahee Ghosh Peshawaria, Ranjita Naik, Besmira Nushi

Motivated by rising concerns around factual incorrectness and hallucinations of LLMs, we present KITAB, a new dataset for measuring constraint satisfaction abilities of language models.

Information Retrieval Retrieval

Diversity of Thought Improves Reasoning Abilities of LLMs

no code implementations11 Oct 2023 Ranjita Naik, Varun Chandrasekaran, Mert Yuksekgonul, Hamid Palangi, Besmira Nushi

Large language models (LLMs) are documented to struggle in settings that require complex reasoning.

Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models

1 code implementation26 Sep 2023 Mert Yuksekgonul, Varun Chandrasekaran, Erik Jones, Suriya Gunasekar, Ranjita Naik, Hamid Palangi, Ece Kamar, Besmira Nushi

We investigate the internal behavior of Transformer-based Large Language Models (LLMs) when they generate factually incorrect text.

Mitigating Spurious Correlations in Multi-modal Models during Fine-tuning

no code implementations8 Apr 2023 Yu Yang, Besmira Nushi, Hamid Palangi, Baharan Mirzasoleiman

Spurious correlations that degrade model generalization or lead the model to be right for the wrong reasons are one of the main robustness concerns for real-world deployments.

Attribute

Social Biases through the Text-to-Image Generation Lens

no code implementations30 Mar 2023 Ranjita Naik, Besmira Nushi

In this paper, we take a multi-dimensional approach to studying and quantifying common social biases as reflected in the generated images, by focusing on how occupations, personality traits, and everyday situations are depicted across representations of (perceived) gender, age, race, and geographical location.

Descriptive Text-to-Image Generation

Benchmarking Spatial Relationships in Text-to-Image Generation

1 code implementation20 Dec 2022 Tejas Gokhale, Hamid Palangi, Besmira Nushi, Vibhav Vineet, Eric Horvitz, Ece Kamar, Chitta Baral, Yezhou Yang

We investigate the ability of T2I models to generate correct spatial relationships among objects and present VISOR, an evaluation metric that captures how accurately the spatial relationship described in text is generated in the image.

Benchmarking Text-to-Image Generation

Advancing Human-AI Complementarity: The Impact of User Expertise and Algorithmic Tuning on Joint Decision Making

no code implementations16 Aug 2022 Kori Inkpen, Shreya Chappidi, Keri Mallari, Besmira Nushi, Divya Ramesh, Pietro Michelucci, Vani Mandava, Libuše Hannah Vepřek, Gabrielle Quinn

In addition, we found that users' perception of the AI's performance relative on their own also had a significant impact on whether their accuracy improved when given AI recommendations.

Decision Making

Who Goes First? Influences of Human-AI Workflow on Decision Making in Clinical Imaging

no code implementations19 May 2022 Riccardo Fogliato, Shreya Chappidi, Matthew Lungren, Michael Fitzke, Mark Parkinson, Diane Wilson, Paul Fisher, Eric Horvitz, Kori Inkpen, Besmira Nushi

A critical aspect of interaction design for AI-assisted human decision making are policies about the display and sequencing of AI inferences within larger decision-making workflows.

Decision Making

Investigations of Performance and Bias in Human-AI Teamwork in Hiring

no code implementations21 Feb 2022 Andi Peng, Besmira Nushi, Emre Kiciman, Kori Inkpen, Ece Kamar

In AI-assisted decision-making, effective hybrid (human-AI) teamwork is not solely dependent on AI performance alone, but also on its impact on human decision-making.

Decision Making

Hierarchical Analysis of Visual COVID-19 Features from Chest Radiographs

1 code implementation14 Jul 2021 Shruthi Bannur, Ozan Oktay, Melanie Bernhardt, Anton Schwaighofer, Rajesh Jena, Besmira Nushi, Sharan Wadhwani, Aditya Nori, Kal Natarajan, Shazad Ashraf, Javier Alvarez-Valle, Daniel C. Castro

Chest radiography has been a recommended procedure for patient triaging and resource management in intensive care units (ICUs) throughout the COVID-19 pandemic.

Management

Understanding Failures of Deep Networks via Robust Feature Extraction

1 code implementation CVPR 2021 Sahil Singla, Besmira Nushi, Shital Shah, Ece Kamar, Eric Horvitz

Traditional evaluation metrics for learned models that report aggregate scores over a test set are insufficient for surfacing important and informative patterns of failure over features and instances.

An Empirical Analysis of Backward Compatibility in Machine Learning Systems

no code implementations11 Aug 2020 Megha Srivastava, Besmira Nushi, Ece Kamar, Shital Shah, Eric Horvitz

In many applications of machine learning (ML), updates are performed with the goal of enhancing model performance.

BIG-bench Machine Learning

Is the Most Accurate AI the Best Teammate? Optimizing AI for Teamwork

no code implementations27 Apr 2020 Gagan Bansal, Besmira Nushi, Ece Kamar, Eric Horvitz, Daniel S. Weld

To optimize the team performance for this setting we maximize the team's expected utility, expressed in terms of the quality of the final decision, cost of verifying, and individual accuracies of people and machines.

Decision Making

SQuINTing at VQA Models: Introspecting VQA Models with Sub-Questions

no code implementations CVPR 2020 Ramprasaath R. Selvaraju, Purva Tendulkar, Devi Parikh, Eric Horvitz, Marco Ribeiro, Besmira Nushi, Ece Kamar

We quantify the extent to which this phenomenon occurs by creating a new Reasoning split of the VQA dataset and collecting VQA-introspect, a new dataset1 which consists of 238K new perception questions which serve as sub questions corresponding to the set of perceptual tasks needed to effectively answer the complex reasoning questions in the Reasoning split.

Visual Question Answering (VQA)

What You See Is What You Get? The Impact of Representation Criteria on Human Bias in Hiring

no code implementations8 Sep 2019 Andi Peng, Besmira Nushi, Emre Kiciman, Kori Inkpen, Siddharth Suri, Ece Kamar

Although systematic biases in decision-making are widely documented, the ways in which they emerge from different sources is less understood.

Decision Making

A Case for Backward Compatibility for Human-AI Teams

no code implementations4 Jun 2019 Gagan Bansal, Besmira Nushi, Ece Kamar, Dan Weld, Walter Lasecki, Eric Horvitz

We introduce the notion of the compatibility of an AI update with prior user experience and present methods for studying the role of compatibility in human-AI teams.

Decision Making

Metareasoning in Modular Software Systems: On-the-Fly Configuration using Reinforcement Learning with Rich Contextual Representations

no code implementations12 May 2019 Aditya Modi, Debadeepta Dey, Alekh Agarwal, Adith Swaminathan, Besmira Nushi, Sean Andrist, Eric Horvitz

We address the opportunity to maximize the utility of an overall computing system by employing reinforcement learning to guide the configuration of the set of interacting modules that comprise the system.

Decision Making reinforcement-learning +1

Towards Accountable AI: Hybrid Human-Machine Analyses for Characterizing System Failure

no code implementations19 Sep 2018 Besmira Nushi, Ece Kamar, Eric Horvitz

We present Pandora, a set of hybrid human-machine methods and tools for describing and explaining system failures.

BIG-bench Machine Learning Image Captioning

Crowd Access Path Optimization: Diversity Matters

no code implementations8 Aug 2015 Besmira Nushi, Adish Singla, Anja Gruenheid, Erfan Zamanian, Andreas Krause, Donald Kossmann

Based on this intuitive idea, we introduce the Access Path Model (APM), a novel crowd model that leverages the notion of access paths as an alternative way of retrieving information.

Cannot find the paper you are looking for? You can Submit a new open access paper.