Search Results for author: Leonard Tang

Found 11 papers, 5 papers with code

Introducing v0.5 of the AI Safety Benchmark from MLCommons

1 code implementation • 18 Apr 2024 • Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller, Ram Gandikota, Agasthya Gangavarapu, Ananya Gangavarapu, James Gealy, Rajat Ghosh, James Goel, Usman Gohar, Sujata Goswami, Scott A. Hale, Wiebke Hutiri, Joseph Marvin Imperial, Surgan Jandial, Nick Judd, Felix Juefei-Xu, Foutse khomh, Bhavya Kailkhura, Hannah Rose Kirk, Kevin Klyman, Chris Knotz, Michael Kuchnik, Shachi H. Kumar, Chris Lengerich, Bo Li, Zeyi Liao, Eileen Peters Long, Victor Lu, Yifan Mai, Priyanka Mary Mammen, Kelvin Manyeki, Sean McGregor, Virendra Mehta, Shafee Mohammed, Emanuel Moss, Lama Nachman, Dinesh Jinenhally Naganna, Amin Nikanjam, Besmira Nushi, Luis Oala, Iftach Orr, Alicia Parrish, Cigdem Patlak, William Pietri, Forough Poursabzi-Sangdeh, Eleonora Presani, Fabrizio Puletti, Paul Röttger, Saurav Sahay, Tim Santos, Nino Scherrer, Alice Schoenauer Sebag, Patrick Schramowski, Abolfazl Shahbazi, Vin Sharma, Xudong Shen, Vamsi Sistla, Leonard Tang, Davide Testuggine, Vithursan Thangarasa, Elizabeth Anne Watkins, Rebecca Weiss, Chris Welty, Tyler Wilbers, Adina Williams, Carole-Jean Wu, Poonam Yadav, Xianjun Yang, Yi Zeng, Wenhui Zhang, Fedor Zhdanov, Jiacheng Zhu, Percy Liang, Peter Mattson, Joaquin Vanschoren

We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0. 5 benchmark.

Paper
Code

Consistent Explanations in the Face of Model Indeterminacy via Ensembling

no code implementations • 9 Jun 2023 • Dan Ley, Leonard Tang, Matthew Nazari, Hongjin Lin, Suraj Srinivas, Himabindu Lakkaraju

This work addresses the challenge of providing consistent explanations for predictive models in the presence of model indeterminacy, which arises due to the existence of multiple (nearly) equally well-performing models for a given dataset and task.

Paper
Add Code

Baselines for Identifying Watermarked Large Language Models

no code implementations • 29 May 2023 • Leonard Tang, Gavin Uberti, Tom Shlomi

We consider the emerging problem of identifying the presence and use of watermarking schemes in widely used, publicly hosted, closed source large language models (LLMs).

Paper
Add Code

Learning the Wrong Lessons: Inserting Trojans During Knowledge Distillation

no code implementations • 9 Mar 2023 • Leonard Tang, Tom Shlomi, Alexander Cai

In recent years, knowledge distillation has become a cornerstone of efficiently deployed machine learning, with labs and industries using knowledge distillation to train models that are inexpensive and resource-optimized.

Knowledge Distillation

Paper
Add Code

MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding

2 code implementations • 2 Jan 2023 • Steven H. Wang, Antoine Scardigli, Leonard Tang, Wei Chen, Dimitry Levkin, Anya Chen, Spencer Ball, Thomas Woodside, Oliver Zhang, Dan Hendrycks

Reading comprehension of legal text can be a particularly challenging task due to the length and complexity of legal clauses and a shortage of expert-annotated datasets.

Reading Comprehension

Paper
Code

The Naughtyformer: A Transformer Understands Offensive Humor

no code implementations • 25 Nov 2022 • Leonard Tang, Alexander Cai, Steve Li, Jason Wang

Jokes are intentionally written to be funny, but not all jokes are created the same.

Humor Detection

Paper
Add Code

Lila: A Unified Benchmark for Mathematical Reasoning

1 code implementation • 31 Oct 2022 • Swaroop Mishra, Matthew Finlayson, Pan Lu, Leonard Tang, Sean Welleck, Chitta Baral, Tanmay Rajpurohit, Oyvind Tafjord, Ashish Sabharwal, Peter Clark, Ashwin Kalyan

Mathematical reasoning skills are essential for general-purpose intelligent systems to perform tasks from grocery shopping to climate modeling.

Ranked #1 on Mathematical Reasoning on Lila (OOD)

Mathematical Reasoning Question Answering

Paper
Code

From Human Days to Machine Seconds: Automatically Answering and Generating Machine Learning Final Exams

no code implementations • 11 Jun 2022 • Iddo Drori, Sarah J. Zhang, Reece Shuttleworth, Sarah Zhang, Keith Tyser, Zad Chin, Pedro Lantigua, Saisamrit Surbehera, Gregory Hunter, Derek Austin, Leonard Tang, Yann Hicke, Sage Simhon, Sathwik Karnik, Darnell Granberry, Madeleine Udell

We curate a dataset and benchmark of questions from machine learning final exams available online and code for answering these questions and generating new questions.

BIG-bench Machine Learning Few-Shot Learning +4

Paper
Add Code

A Neural Network Solves, Explains, and Generates University Math Problems by Program Synthesis and Few-Shot Learning at Human Level

1 code implementation • 31 Dec 2021 • Iddo Drori, Sarah Zhang, Reece Shuttleworth, Leonard Tang, Albert Lu, Elizabeth Ke, Kevin Liu, Linda Chen, Sunny Tran, Newman Cheng, Roman Wang, Nikhil Singh, Taylor L. Patti, Jayson Lynch, Avi Shporer, Nakul Verma, Eugene Wu, Gilbert Strang

We automatically synthesize programs using few-shot learning and OpenAI's Codex transformer and execute them to solve course problems at 81% automatic accuracy.

Few-Shot Learning Language Modelling +4

179

Paper
Code

PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures

2 code implementations • CVPR 2022 • Dan Hendrycks, Andy Zou, Mantas Mazeika, Leonard Tang, Bo Li, Dawn Song, Jacob Steinhardt

In real-world applications of machine learning, reliable and safe systems must consider measures of performance beyond standard test set accuracy.

Adversarial Robustness Anomaly Detection +1

Paper
Code

Solving Probability and Statistics Problems by Program Synthesis

no code implementations • 16 Nov 2021 • Leonard Tang, Elizabeth Ke, Nikhil Singh, Nakul Verma, Iddo Drori

Our work is the first to introduce a new dataset of university-level probability and statistics problems and solve these problems in a scalable fashion using the program synthesis capabilities of large language models.

Program Synthesis Prompt Engineering

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.