Search Results for author: Niket Tandon

Found 42 papers, 14 papers with code

Think about it! Improving defeasible reasoning by first modeling the question scenario.

1 code implementation EMNLP 2021 Aman Madaan, Niket Tandon, Dheeraj Rajagopal, Peter Clark, Yiming Yang, Eduard Hovy

Defeasible reasoning is the mode of reasoning where conclusions can be overturned by taking into account new evidence.

proScript: Partially Ordered Scripts Generation

no code implementations Findings (EMNLP) 2021 Keisuke Sakaguchi, Chandra Bhagavatula, Ronan Le Bras, Niket Tandon, Peter Clark, Yejin Choi

Scripts – prototypical event sequences describing everyday activities – have been shown to help understand narratives by providing expectations, resolving ambiguity, and filling in unstated information.

Text Generation valid

WorldValuesBench: A Large-Scale Benchmark Dataset for Multi-Cultural Value Awareness of Language Models

1 code implementation25 Apr 2024 Wenlong Zhao, Debanjan Mondal, Niket Tandon, Danica Dillion, Kurt Gray, Yuling Gu

The awareness of multi-cultural human values is critical to the ability of language models (LMs) to generate safe and personalized responses.

Calibrating Large Language Models with Sample Consistency

no code implementations21 Feb 2024 Qing Lyu, Kumar Shridhar, Chaitanya Malaviya, Li Zhang, Yanai Elazar, Niket Tandon, Marianna Apidianaki, Mrinmaya Sachan, Chris Callison-Burch

Accurately gauging the confidence level of Large Language Models' (LLMs) predictions is pivotal for their reliable application.

In-Context Principle Learning from Mistakes

no code implementations8 Feb 2024 Tianjun Zhang, Aman Madaan, Luyu Gao, Steven Zheng, Swaroop Mishra, Yiming Yang, Niket Tandon, Uri Alon

We evaluate LEAP on a wide range of benchmarks, including multi-hop question answering (Hotpot QA), textual QA (DROP), Big-Bench Hard reasoning, and math problems (GSM8K and MATH); in all these benchmarks, LEAP improves the strongest available LLMs such as GPT-3. 5-turbo, GPT-4, GPT-4 turbo and Claude-2. 1.

GSM8K In-Context Learning +3

One Size Does Not Fit All: Customizing Open-Domain Procedures

no code implementations16 Nov 2023 Yash Kumar Lal, Li Zhang, Faeze Brahman, Bodhisattwa Prasad Majumder, Peter Clark, Niket Tandon

Our approach is to test several simple multi-LLM-agent architectures for customization, as well as an end-to-end LLM, using a new evaluation set, called CustomPlans, of over 200 WikiHow procedures each with a customization need.

Well begun is half done: Importance of Starting Right in Multi-Step Math Reasoning

no code implementations14 Nov 2023 Kushal Jain, Niket Tandon, Kumar Shridhar

We propose two ways in which a smaller model can benefit from initial guidance: 1) asking an LLM for initial guidance, and 2) self-questioning guidance, where the student model can first initiate a question regarding how to start and then continue that chain.

GSM8K Math

What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations

no code implementations24 Oct 2023 Kavel Rao, Liwei Jiang, Valentina Pyatkin, Yuling Gu, Niket Tandon, Nouha Dziri, Faeze Brahman, Yejin Choi

From this model we distill a high-quality dataset, \delta-Rules-of-Thumb, of 1. 2M entries of contextualizations and rationales for 115K defeasible moral actions rated highly by human annotators 85. 9% to 99. 8% of the time.

Imitation Learning

CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization

no code implementations16 Oct 2023 Bodhisattwa Prasad Majumder, Bhavana Dalvi Mishra, Peter Jansen, Oyvind Tafjord, Niket Tandon, Li Zhang, Chris Callison-Burch, Peter Clark

Language agents have shown some ability to interact with an external environment, e. g., a virtual world such as ScienceWorld, to perform complex tasks, e. g., growing a plant, without the startup costs of reinforcement learning.

Let Me Teach You: Pedagogical Foundations of Feedback for Language Models

no code implementations1 Jul 2023 Beatriz Borges, Niket Tandon, Tanja Käser, Antoine Bosselut

In a different world, research in pedagogy has long established several effective feedback models.

Editing Common Sense in Transformers

no code implementations24 May 2023 Anshita Gupta, Debanjan Mondal, Akshay Krishna Sheshadri, Wenlong Zhao, Xiang Lorraine Li, Sarah Wiegreffe, Niket Tandon

However, these editing methods have only been evaluated on statements about encyclopedic knowledge with a single correct answer.

Common Sense Reasoning Model Editing +1

Aligning Language Models to User Opinions

no code implementations24 May 2023 EunJeong Hwang, Bodhisattwa Prasad Majumder, Niket Tandon

An important aspect of developing LLMs that interact with humans is to align models' behavior to their users.

Open-Ended Question Answering

OpenPI2.0: An Improved Dataset for Entity Tracking in Texts

1 code implementation24 May 2023 Li Zhang, Hainiu Xu, Abhinav Kommula, Chris Callison-Burch, Niket Tandon

An earlier dataset, OpenPI, provided crowdsourced annotations of entity state changes in text.

Question Answering

Memory-assisted prompt editing to improve GPT-3 after deployment

1 code implementation16 Jan 2022 Aman Madaan, Niket Tandon, Peter Clark, Yiming Yang

Large LMs such as GPT-3 are powerful, but can commit mistakes that are obvious to humans.

Interscript: A dataset for interactive learning of scripts through error feedback

1 code implementation15 Dec 2021 Niket Tandon, Aman Madaan, Peter Clark, Keisuke Sakaguchi, Yiming Yang

We present a new dataset, Interscript, containing user feedback on a deployed model that generates complex everyday tasks.

Structured Prediction

Think about it! Improving defeasible reasoning by first modeling the question scenario

1 code implementation24 Oct 2021 Aman Madaan, Niket Tandon, Dheeraj Rajagopal, Peter Clark, Yiming Yang, Eduard Hovy

Defeasible reasoning is the mode of reasoning where conclusions can be overturned by taking into account new evidence.

Improving Neural Model Performance through Natural Language Feedback on Their Explanations

no code implementations18 Apr 2021 Aman Madaan, Niket Tandon, Dheeraj Rajagopal, Yiming Yang, Peter Clark, Keisuke Sakaguchi, Ed Hovy

A class of explainable NLP models for reasoning tasks support their decisions by generating free-form or structured explanations, but what happens when these supporting structures contain errors?

proScript: Partially Ordered Scripts Generation via Pre-trained Language Models

no code implementations16 Apr 2021 Keisuke Sakaguchi, Chandra Bhagavatula, Ronan Le Bras, Niket Tandon, Peter Clark, Yejin Choi

Scripts - standardized event sequences describing typical everyday activities - have been shown to help understand narratives by providing expectations, resolving ambiguity, and filling in unstated information.

Text Generation valid

Information to Wisdom: Commonsense Knowledge Extraction and Compilation

no code implementations4 Mar 2021 Simon Razniewski, Niket Tandon, Aparna S. Varde

Commonsense knowledge is a foundational cornerstone of artificial intelligence applications.

A Dataset for Tracking Entities in Open Domain Procedural Text

no code implementations EMNLP 2020 Niket Tandon, Keisuke Sakaguchi, Bhavana Dalvi Mishra, Dheeraj Rajagopal, Peter Clark, Michal Guerquin, Kyle Richardson, Eduard Hovy

Our solution is a new task formulation where given just a procedural text as input, the task is to generate a set of state change tuples(entity, at-tribute, before-state, after-state)for each step, where the entity, attribute, and state values must be predicted from an open vocabulary.

Attribute

Do Dogs have Whiskers? A New Knowledge Base of hasPart Relations

no code implementations12 Jun 2020 Sumithra Bhakthavatsalam, Kyle Richardson, Niket Tandon, Peter Clark

We present a new knowledge-base of hasPart relationships, extracted from a large corpus of generic statements.

WIQA: A dataset for "What if..." reasoning over procedural text

1 code implementation10 Sep 2019 Niket Tandon, Bhavana Dalvi Mishra, Keisuke Sakaguchi, Antoine Bosselut, Peter Clark

We introduce WIQA, the first large-scale dataset of "What if..." questions over procedural text.

Multiple-choice

Everything Happens for a Reason: Discovering the Purpose of Actions in Procedural Text

no code implementations IJCNLP 2019 Bhavana Dalvi Mishra, Niket Tandon, Antoine Bosselut, Wen-tau Yih, Peter Clark

Our goal is to better comprehend procedural text, e. g., a paragraph about photosynthesis, by not only predicting what happens, but why some actions need to happen before others.

Reading Comprehension

From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project

no code implementations4 Sep 2019 Peter Clark, Oren Etzioni, Daniel Khashabi, Tushar Khot, Bhavana Dalvi Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, Niket Tandon, Sumithra Bhakthavatsalam, Dirk Groeneveld, Michal Guerquin, Michael Schmitz

This paper reports unprecedented success on the Grade 8 New York Regents Science Exam, where for the first time a system scores more than 90% on the exam's non-diagram, multiple choice (NDMC) questions.

Multiple-choice Question Answering

Know2Look: Commonsense Knowledge for Visual Search

no code implementations WS 2016 Sreyasi Nag Chowdhury, Niket Tandon, Gerhard Weikum

With the rise in popularity of social media, images accompanied by contextual text form a huge section of the web.

Retrieval

Be Consistent! Improving Procedural Text Comprehension using Label Consistency

1 code implementation NAACL 2019 Xinya Du, Bhavana Dalvi Mishra, Niket Tandon, Antoine Bosselut, Wen-tau Yih, Peter Clark, Claire Cardie

Our goal is procedural text comprehension, namely tracking how the properties of entities (e. g., their location) change with time given a procedural text (e. g., a paragraph about photosynthesis, a recipe).

Reading Comprehension

Reasoning about Actions and State Changes by Injecting Commonsense Knowledge

1 code implementation EMNLP 2018 Niket Tandon, Bhavana Dalvi Mishra, Joel Grus, Wen-tau Yih, Antoine Bosselut, Peter Clark

Comprehending procedural text, e. g., a paragraph describing photosynthesis, requires modeling actions and the state changes they produce, so that questions about entities at different timepoints can be answered.

Reading Comprehension Structured Prediction

Tracking State Changes in Procedural Text: A Challenge Dataset and Models for Process Paragraph Comprehension

no code implementations NAACL 2018 Bhavana Dalvi Mishra, Lifu Huang, Niket Tandon, Wen-tau Yih, Peter Clark

The new dataset, ProPara, is the first to contain natural (rather than machine-generated) text about a changing world along with a full annotation of entity states (location and existence) during those changes (81k datapoints).

Procedural Text Understanding

What Happened? Leveraging VerbNet to Predict the Effects of Actions in Procedural Text

no code implementations15 Apr 2018 Peter Clark, Bhavana Dalvi, Niket Tandon

To supply this knowledge, we leverage VerbNet to build a rulebase (called the Semantic Lexicon) of the preconditions and effects of actions, and use it along with commonsense knowledge of persistence to answer questions about change.

Reading Comprehension

Learning Language-Visual Embedding for Movie Understanding with Natural-Language

no code implementations26 Sep 2016 Atousa Torabi, Niket Tandon, Leonid Sigal

We evaluate our models on large scale LSMDC16 movie dataset for two tasks: 1) Standard Ranking for video annotation and retrieval 2) Our proposed movie multiple-choice test.

Multiple-choice Retrieval +1

Movie Description

no code implementations12 May 2016 Anna Rohrbach, Atousa Torabi, Marcus Rohrbach, Niket Tandon, Christopher Pal, Hugo Larochelle, Aaron Courville, Bernt Schiele

In addition we also collected and aligned movie scripts used in prior work and compare the two sources of descriptions.

Benchmarking

A Dataset for Movie Description

no code implementations CVPR 2015 Anna Rohrbach, Marcus Rohrbach, Niket Tandon, Bernt Schiele

In this work we propose a novel dataset which contains transcribed DVS, which is temporally aligned to full length HD movies.

Benchmarking Descriptive

Cannot find the paper you are looking for? You can Submit a new open access paper.