Search Results for author: Jack Hessel

Found 43 papers, 29 papers with code

Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning

no code implementations • 23 Feb 2024 • Tejas Srinivasan, Jack Hessel, Tanmay Gupta, Bill Yuchen Lin, Yejin Choi, Jesse Thomason, Khyathi Raghavi Chandu

Prior work on selective prediction minimizes incorrect predictions from vision-language models (VLMs) by allowing them to abstain from answering when uncertain.

Paper
Add Code

L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects

no code implementations • 14 Feb 2024 • Yutaro Yamada, Khyathi Chandu, YuChen Lin, Jack Hessel, Ilker Yildirim, Yejin Choi

In this paper, we propose a language agent with chain-of-3D-thoughts (L3GO), an inference-time approach that can reason about part-based 3D mesh generation of unconventional objects that current data-driven diffusion models struggle with.

Image Generation Text to 3D

Paper
Add Code

Deal, or no deal (or who knows)? Forecasting Uncertainty in Conversations using Large Language Models

no code implementations • 5 Feb 2024 • Anthony Sicilia, Hyunwoo Kim, Khyathi Raghavi Chandu, Malihe Alikhani, Jack Hessel

Effective interlocutors account for the uncertain goals, beliefs, and emotions of others.

Paper
Add Code

OLMo: Accelerating the Science of Language Models

2 code implementations • 1 Feb 2024 • Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi

Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs.

Language Modelling

3,962

Paper
Code

NovaCOMET: Open Commonsense Foundation Models with Symbolic Knowledge Distillation

no code implementations • 10 Dec 2023 • Peter West, Ronan Le Bras, Taylor Sorensen, Bill Yuchen Lin, Liwei Jiang, Ximing Lu, Khyathi Chandu, Jack Hessel, Ashutosh Baheti, Chandra Bhagavatula, Yejin Choi

We present NovaCOMET, an open commonsense knowledge model, that combines the best aspects of knowledge and general task models.

Knowledge Distillation

Paper
Add Code

Localized Symbolic Knowledge Distillation for Visual Commonsense Models

2 code implementations • NeurIPS 2023 • Jae Sung Park, Jack Hessel, Khyathi Raghavi Chandu, Paul Pu Liang, Ximing Lu, Peter West, Youngjae Yu, Qiuyuan Huang, Jianfeng Gao, Ali Farhadi, Yejin Choi

Empirical results and human evaluations in a zero-shot setup demonstrate that our distillation method results in more precise VL models of reasoning compared to a baseline of passing a generated referring expression to an LLM.

Instruction Following Knowledge Distillation +3

Paper
Code

UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations

no code implementations • 14 Nov 2023 • Wenting Zhao, Justin T Chiu, Jena D. Hwang, Faeze Brahman, Jack Hessel, Sanjiban Choudhury, Yejin Choi, Xiang Lorraine Li, Alane Suhr

Language technologies that accurately model the dynamics of events must perform commonsense reasoning.

Imitation Learning Specificity

Paper
Add Code

Tailoring Self-Rationalizers with Multi-Reward Distillation

1 code implementation • 6 Nov 2023 • Sahana Ramnath, Brihi Joshi, Skyler Hallinan, Ximing Lu, Liunian Harold Li, Aaron Chan, Jack Hessel, Yejin Choi, Xiang Ren

Results on five difficult question-answering datasets StrategyQA, QuaRel, OpenBookQA, NumerSense and QASC show that not only does MaRio improve task accuracy, but it also improves the self-rationalization quality of small LMs across the aforementioned axes better than a supervised fine-tuning (SFT) baseline.

Question Answering StrategyQA

Paper
Code

What's "up" with vision-language models? Investigating their struggle with spatial reasoning

1 code implementation • 30 Oct 2023 • Amita Kamath, Jack Hessel, Kai-Wei Chang

Recent vision-language (VL) models are powerful, but can they reliably distinguish "right" from "left"?

Paper
Code

Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging

1 code implementation • 17 Oct 2023 • Joel Jang, Seungone Kim, Bill Yuchen Lin, Yizhong Wang, Jack Hessel, Luke Zettlemoyer, Hannaneh Hajishirzi, Yejin Choi, Prithviraj Ammanabrolu

In this work, we study Reinforcement Learning from Personalized Human Feedback (RLPHF) problem, wherein LLMs are aligned to multiple (sometimes conflicting) preferences by modeling alignment as a Multi-Objective Reinforcement Learning (MORL) problem.

Language Modelling Large Language Model +2

Paper
Code

Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms

1 code implementation • 16 Oct 2023 • Seungju Han, Junhyeok Kim, Jack Hessel, Liwei Jiang, Jiwan Chung, Yejin Son, Yejin Choi, Youngjae Yu

NORMLENS consists of 10K human judgments accompanied by free-form explanations covering 2K multimodal situations, and serves as a probe to address two questions: (1) to what extent can models align with average human judgment?

Paper
Code

VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use

1 code implementation • 12 Aug 2023 • Yonatan Bitton, Hritik Bansal, Jack Hessel, Rulin Shao, Wanrong Zhu, Anas Awadalla, Josh Gardner, Rohan Taori, Ludwig Schmidt

These descriptions enable 1) collecting human-verified reference outputs for each instance; and 2) automatic evaluation of candidate multimodal generations using a text-only LLM, aligning with human judgment.

Instruction Following

Paper
Code

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

2 code implementations • 2 Aug 2023 • Anas Awadalla, Irena Gao, Josh Gardner, Jack Hessel, Yusuf Hanafy, Wanrong Zhu, Kalyani Marathe, Yonatan Bitton, Samir Gadre, Shiori Sagawa, Jenia Jitsev, Simon Kornblith, Pang Wei Koh, Gabriel Ilharco, Mitchell Wortsman, Ludwig Schmidt

We introduce OpenFlamingo, a family of autoregressive vision-language models ranging from 3B to 9B parameters.

Ranked #14 on Visual Question Answering (VQA) on InfiMM-Eval

Visual Question Answering

3,465

Paper
Code

FunQA: Towards Surprising Video Comprehension

1 code implementation • 26 Jun 2023 • Binzhu Xie, Sicheng Zhang, Zitang Zhou, Bo Li, Yuanhan Zhang, Jack Hessel, Jingkang Yang, Ziwei Liu

Surprising videos, such as funny clips, creative performances, or visual illusions, attract significant attention.

Question Answering Text Generation +3

Paper
Code

Symbolic Chain-of-Thought Distillation: Small Models Can Also "Think" Step-by-Step

1 code implementation • 24 Jun 2023 • Liunian Harold Li, Jack Hessel, Youngjae Yu, Xiang Ren, Kai-Wei Chang, Yejin Choi

We release our corpus of chain-of-thought samples and code.

Paper
Code

How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources

1 code implementation • NeurIPS 2023 • Yizhong Wang, Hamish Ivison, Pradeep Dasigi, Jack Hessel, Tushar Khot, Khyathi Raghavi Chandu, David Wadden, Kelsey MacMillan, Noah A. Smith, Iz Beltagy, Hannaneh Hajishirzi

Our evaluations show that the best model in any given evaluation reaches on average 87% of ChatGPT performance, and 73% of GPT-4 performance, suggesting that further investment in building better base models and instruction-tuning data is required to close the gap.

Instruction Following

995

Paper
Code

Text encoders bottleneck compositionality in contrastive vision-language models

1 code implementation • 24 May 2023 • Amita Kamath, Jack Hessel, Kai-Wei Chang

We first curate CompPrompts, a set of increasingly compositional image captions that VL models should be able to capture (e. g., single object, to object+property, to multiple interacting objects).

Attribute Image Captioning +1

Paper
Code

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text

2 code implementations • NeurIPS 2023 • Wanrong Zhu, Jack Hessel, Anas Awadalla, Samir Yitzhak Gadre, Jesse Dodge, Alex Fang, Youngjae Yu, Ludwig Schmidt, William Yang Wang, Yejin Choi

We release Multimodal C4, an augmentation of the popular text-only C4 corpus with images interleaved.

Few-Shot Learning

863

Paper
Code

CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos

1 code implementation • ICCV 2023 • Seungju Han, Jack Hessel, Nouha Dziri, Yejin Choi, Youngjae Yu

To train CHAMPAGNE, we collect and release YTD-18M, a large-scale corpus of 18M video-based dialogues.

Language Modelling

Paper
Code

Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images

no code implementations • ICCV 2023 • Nitzan Bitton-Guetta, Yonatan Bitton, Jack Hessel, Ludwig Schmidt, Yuval Elovici, Gabriel Stanovsky, Roy Schwartz

We introduce WHOOPS!, a new dataset and benchmark for visual commonsense.

Ranked #1 on Image-to-Text Retrieval on WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images (using extra training data)

Common Sense Reasoning Explanation Generation +6

Paper
Add Code

Fusing Pre-Trained Language Models With Multimodal Prompts Through Reinforcement Learning

1 code implementation • CVPR 2023 • Youngjae Yu, Jiwan Chung, Heeseung Yun, Jack Hessel, Jae Sung Park, Ximing Lu, Rowan Zellers, Prithviraj Ammanabrolu, Ronan Le Bras, Gunhee Kim, Yejin Choi

Language models are capable of commonsense reasoning: while domain-specific models can learn from explicit knowledge (e. g. commonsense graphs [6], ethical norms [25]), and larger models like GPT-3 manifest broad commonsense reasoning capacity.

Language Modelling reinforcement-learning +2

Paper
Code

SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization

1 code implementation • 20 Dec 2022 • Hyunwoo Kim, Jack Hessel, Liwei Jiang, Peter West, Ximing Lu, Youngjae Yu, Pei Zhou, Ronan Le Bras, Malihe Alikhani, Gunhee Kim, Maarten Sap, Yejin Choi

Data scarcity has been a long standing issue in the field of open-domain social dialogue.

Dialogue Generation Large Language Model

200

Paper
Code

Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization

3 code implementations • 3 Oct 2022 • Rajkumar Ramamurthy, Prithviraj Ammanabrolu, Kianté Brantley, Jack Hessel, Rafet Sifa, Christian Bauckhage, Hannaneh Hajishirzi, Yejin Choi

To help answer this, we first introduce an open-source modular library, RL4LMs (Reinforcement Learning for Language Models), for optimizing language generators with RL.

Decision Making Policy Gradient Methods +3

2,088

Paper
Code

Do Androids Laugh at Electric Sheep? Humor "Understanding" Benchmarks from The New Yorker Caption Contest

no code implementations • 13 Sep 2022 • Jack Hessel, Ana Marasović, Jena D. Hwang, Lillian Lee, Jeff Da, Rowan Zellers, Robert Mankoff, Yejin Choi

Large neural networks can now generate jokes, but do they really "understand" humor?

Language Modelling

Paper
Add Code

Quark: Controllable Text Generation with Reinforced Unlearning

1 code implementation • 26 May 2022 • Ximing Lu, Sean Welleck, Jack Hessel, Liwei Jiang, Lianhui Qin, Peter West, Prithviraj Ammanabrolu, Yejin Choi

Large-scale language models often learn behaviors that are misaligned with user expectations.

Language Modelling Text Generation

Paper
Code

Multimodal Knowledge Alignment with Reinforcement Learning

1 code implementation • 25 May 2022 • Youngjae Yu, Jiwan Chung, Heeseung Yun, Jack Hessel, JaeSung Park, Ximing Lu, Prithviraj Ammanabrolu, Rowan Zellers, Ronan Le Bras, Gunhee Kim, Yejin Choi

Large language models readily adapt to novel settings, even without task-specific training data.

Audio captioning Language Modelling +3

Paper
Code

The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning

no code implementations • 10 Feb 2022 • Jack Hessel, Jena D. Hwang, Jae Sung Park, Rowan Zellers, Chandra Bhagavatula, Anna Rohrbach, Kate Saenko, Yejin Choi

We present Sherlock, an annotated corpus of 103K images for testing machine capacity for abductive reasoning beyond literal image contents.

Visual Abductive Reasoning Visual Reasoning

Paper
Add Code

MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound

no code implementations • CVPR 2022 • Rowan Zellers, Jiasen Lu, Ximing Lu, Youngjae Yu, Yanpeng Zhao, Mohammadreza Salehi, Aditya Kusupati, Jack Hessel, Ali Farhadi, Yejin Choi

Given a video, we replace snippets of text and audio with a MASK token; the model learns by choosing the correct masked-out snippet.

Ranked #6 on Action Classification on Kinetics-600 (using extra training data)

Action Classification Navigate +2

Paper
Add Code

Reframing Human-AI Collaboration for Generating Free-Text Explanations

1 code implementation • NAACL 2022 • Sarah Wiegreffe, Jack Hessel, Swabha Swayamdipta, Mark Riedl, Yejin Choi

We create a pipeline that combines GPT-3 with a supervised filter that incorporates binary acceptability judgments from humans in the loop.

Paper
Code

Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer

1 code implementation • NAACL 2022 • Yanpeng Zhao, Jack Hessel, Youngjae Yu, Ximing Lu, Rowan Zellers, Yejin Choi

In a difficult zero-shot setting with no paired audio-text data, our model demonstrates state-of-the-art zero-shot performance on the ESC50 and US8K audio classification tasks, and even surpasses the supervised state of the art for Clotho caption retrieval (with audio queries) by 2. 2\% R@1.

Audio Classification Audio Tagging +3

Paper
Code

Symbolic Knowledge Distillation: from General Language Models to Commonsense Models

1 code implementation • NAACL 2022 • Peter West, Chandra Bhagavatula, Jack Hessel, Jena D. Hwang, Liwei Jiang, Ronan Le Bras, Ximing Lu, Sean Welleck, Yejin Choi

We apply this to the ATOMIC resource, and share our new symbolic knowledge graph and commonsense models.

Knowledge Distillation Knowledge Graphs +2

129

Paper
Code

How effective is BERT without word ordering? Implications for language understanding and data privacy

no code implementations • ACL 2021 • Jack Hessel, Alexandra Schofield

Ordered word sequences contain the rich structures that define language.

Ethics QNLI

Paper
Add Code

MERLOT: Multimodal Neural Script Knowledge Models

1 code implementation • NeurIPS 2021 • Rowan Zellers, Ximing Lu, Jack Hessel, Youngjae Yu, Jae Sung Park, Jize Cao, Ali Farhadi, Yejin Choi

As humans, we understand events in the visual world contextually, performing multimodal reasoning across time to make inferences about the past, present, and future.

Multimodal Reasoning Visual Commonsense Reasoning

221

Paper
Code

CLIPScore: A Reference-free Evaluation Metric for Image Captioning

3 code implementations • EMNLP 2021 • Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, Yejin Choi

Image captioning has conventionally relied on reference-based automatic evaluations, where machine captions are compared against captions written by humans.

Ranked #1 on Hallucination Pair-wise Detection (4-ref) on FOIL

Hallucination Pair-wise Detection (1-ref) Hallucination Pair-wise Detection (4-ref) +3

159

Paper
Code

Domain-Specific Lexical Grounding in Noisy Visual-Textual Documents

1 code implementation • EMNLP 2020 • Gregory Yauney, Jack Hessel, David Mimno

Images can give us insights into the contextual meanings of words, but current image-text grounding approaches require detailed annotations.

Clustering object-detection +2

Paper
Code

Does my multimodal model learn cross-modal interactions? It's harder to tell than you might think!

no code implementations • EMNLP 2020 • Jack Hessel, Lillian Lee

Modeling expressive cross-modal interactions seems crucial in multimodal tasks, such as visual question answering.

Image-text Classification Question Answering +3

Paper
Add Code

Beyond Instructional Videos: Probing for More Diverse Visual-Textual Grounding on YouTube

1 code implementation • EMNLP 2020 • Jack Hessel, Zhenhai Zhu, Bo Pang, Radu Soricut

Pretraining from unlabelled web videos has quickly become the de-facto means of achieving high performance on many video understanding tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

A Case Study on Combining ASR and Visual Features for Generating Instructional Video Captions

no code implementations • CONLL 2019 • Jack Hessel, Bo Pang, Zhenhai Zhu, Radu Soricut

Instructional videos get high-traffic on video sharing platforms, and prior work suggests that providing time-stamped, subtask annotations (e. g., "heat the oil in the pan") improves user experiences.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents

2 code implementations • IJCNLP 2019 • Jack Hessel, Lillian Lee, David Mimno

Images and text co-occur constantly on the web, but explicit links between images and sentences (or other intra-document textual units) are often not present.

Sentence

Paper
Code

Something's Brewing! Early Prediction of Controversy-causing Posts from Discussion Features

no code implementations • NAACL 2019 • Jack Hessel, Lillian Lee

Controversial posts are those that split the preferences of a community, receiving both significant positive and significant negative feedback.

Paper
Add Code

Quantifying the visual concreteness of words and topics in multimodal datasets

1 code implementation • NAACL 2018 • Jack Hessel, David Mimno, Lillian Lee

Multimodal machine learning algorithms aim to learn visual-textual correspondences.

BIG-bench Machine Learning Image Captioning

Paper
Code

Cats and Captions vs. Creators and the Clock: Comparing Multimodal Content to Context in Predicting Relative Popularity

1 code implementation • 6 Mar 2017 • Jack Hessel, Lillian Lee, David Mimno

The content of today's social media is becoming more and more rich, increasingly mixing text, images, videos, and audio.

Paper
Code

Image Representations and New Domains in Neural Image Captioning

no code implementations • WS 2015 • Jack Hessel, Nicolas Savva, Michael J. Wilber

We examine the possibility that recent promising results in automatic caption generation are due primarily to language models.

Caption Generation Image Captioning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.