Search Results for author: Yushi Hu

Found 15 papers, 8 papers with code

BLINK: Multimodal Large Language Models Can See but Not Perceive

no code implementations • 18 Apr 2024 • Xingyu Fu, Yushi Hu, Bangzheng Li, Yu Feng, Haoyu Wang, Xudong Lin, Dan Roth, Noah A. Smith, Wei-Chiu Ma, Ranjay Krishna

We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses on core visual perception abilities not found in other evaluations.

Depth Estimation Multiple-choice +1

Paper
Add Code

Training Language Models to Generate Text with Citations via Fine-grained Rewards

no code implementations • 6 Feb 2024 • Chengyu Huang, Zeqiu Wu, Yushi Hu, Wenya Wang

While recent Large Language Models (LLMs) have proven useful in answering user queries, they are prone to hallucination, and their responses often lack credibility due to missing references to reliable sources.

Hallucination Question Answering

Paper
Add Code

Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models

no code implementations • 5 Dec 2023 • Yushi Hu, Otilia Stretcu, Chun-Ta Lu, Krishnamurthy Viswanathan, Kenji Hata, Enming Luo, Ranjay Krishna, Ariel Fuxman

We propose Visual Program Distillation (VPD), an instruction tuning framework that produces a vision-language model (VLM) capable of solving complex visual tasks with a single forward pass.

Ranked #1 on Meme Classification on Hateful Memes

Language Modelling Large Language Model +3

Paper
Add Code

DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback

no code implementations • 29 Nov 2023 • Jiao Sun, Deqing Fu, Yushi Hu, Su Wang, Royi Rassin, Da-Cheng Juan, Dana Alon, Charles Herrmann, Sjoerd van Steenkiste, Ranjay Krishna, Cyrus Rashtchian

Then, it uses two VLMs to select the best generation: a Visual Question Answering model that measures the alignment of generated images to the text, and another that measures the generation's aesthetic quality.

Question Answering Text-to-Image Generation +1

Paper
Add Code

Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation

no code implementations • 27 Oct 2023 • Jaemin Cho, Yushi Hu, Roopal Garg, Peter Anderson, Ranjay Krishna, Jason Baldridge, Mohit Bansal, Jordi Pont-Tuset, Su Wang

With extensive experimentation and human evaluation on a range of model configurations (LLM, VQA, and T2I), we empirically demonstrate that DSG addresses the challenges noted above.

Question Answering Question Generation +3

Paper
Add Code

Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

no code implementations • NeurIPS 2023 • Zeqiu Wu, Yushi Hu, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A. Smith, Mari Ostendorf, Hannaneh Hajishirzi

We introduce Fine-Grained RLHF, a framework that enables training and learning from reward functions that are fine-grained in two respects: (1) density, providing a reward after every segment (e. g., a sentence) is generated; and (2) incorporating multiple reward models associated with different feedback types (e. g., factual incorrectness, irrelevance, and information incompleteness).

Language Modelling Long Form Question Answering +2

Paper
Add Code

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

1 code implementation • ICCV 2023 • Yushi Hu, Benlin Liu, Jungo Kasai, Yizhong Wang, Mari Ostendorf, Ranjay Krishna, Noah A Smith

We introduce TIFA (Text-to-Image Faithfulness evaluation with question Answering), an automatic evaluation metric that measures the faithfulness of a generated image to its text input via visual question answering (VQA).

4k Language Modelling +4

111

Paper
Code

PromptCap: Prompt-Guided Image Captioning for VQA with GPT-3

no code implementations • ICCV 2023 • Yushi Hu, Hang Hua, Zhengyuan Yang, Weijia Shi, Noah A. Smith, Jiebo Luo

PromptCap outperforms generic captions by a large margin and achieves state-of-the-art accuracy on knowledge-based VQA tasks (60. 4% on OK-VQA and 59. 6% on A-OKVQA).

Image Captioning Question Answering +3

Paper
Add Code

One Embedder, Any Task: Instruction-Finetuned Text Embeddings

3 code implementations • 19 Dec 2022 • Hongjin Su, Weijia Shi, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Wen-tau Yih, Noah A. Smith, Luke Zettlemoyer, Tao Yu

Our analysis suggests that INSTRUCTOR is robust to changes in instructions, and that instruction finetuning mitigates the challenge of training a single model on diverse datasets.

Information Retrieval Learning Word Embeddings +3

4,037

Paper
Code

PromptCap: Prompt-Guided Task-Aware Image Captioning

1 code implementation • 15 Nov 2022 • Yushi Hu, Hang Hua, Zhengyuan Yang, Weijia Shi, Noah A Smith, Jiebo Luo

PromptCap outperforms generic captions by a large margin and achieves state-of-the-art accuracy on knowledge-based VQA tasks (60. 4% on OK-VQA and 59. 6% on A-OKVQA).

Ranked #1 on Visual Question Answering on TextVQA test-standard

Image Captioning Language Modelling +5

Paper
Code

Binding Language Models in Symbolic Languages

1 code implementation • 6 Oct 2022 • Zhoujun Cheng, Tianbao Xie, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Tao Yu

We propose Binder, a training-free neural-symbolic framework that maps the task input to a program, which (1) allows binding a unified API of language model (LM) functionalities to a programming language (e. g., SQL, Python) to extend its grammar coverage and thus tackle more diverse questions, (2) adopts an LM as both the program parser and the underlying model called by the API during execution, and (3) requires only a few in-context exemplar annotations.

Ranked #4 on Table-based Fact Verification on TabFact

Language Modelling Semantic Parsing +1

277

Paper
Code

Unsupervised Learning of Hierarchical Conversation Structure

1 code implementation • 24 May 2022 • Bo-Ru Lu, Yushi Hu, Hao Cheng, Noah A. Smith, Mari Ostendorf

Human conversations can evolve in many different ways, creating challenges for automatic understanding and summarization.

Paper
Code

In-Context Learning for Few-Shot Dialogue State Tracking

1 code implementation • 16 Mar 2022 • Yushi Hu, Chia-Hsuan Lee, Tianbao Xie, Tao Yu, Noah A. Smith, Mari Ostendorf

In this work, we propose an in-context learning (ICL) framework for zero-shot and few-shot learning DST, where a large pre-trained language model (LM) takes a test instance and a few exemplars as input, and directly decodes the dialogue state without any parameter updates.

Dialogue State Tracking Few-Shot Learning +3

Paper
Code

Acoustic span embeddings for multilingual query-by-example search

1 code implementation • 24 Nov 2020 • Yushi Hu, Shane Settle, Karen Livescu

In this work, we generalize AWE training to spans of words, producing acoustic span embeddings (ASE), and explore the application of ASE to QbE with arbitrary-length queries in multiple unseen languages.

Dynamic Time Warping Word Embeddings

Paper
Code

Multilingual Jointly Trained Acoustic and Written Word Embeddings

1 code implementation • 24 Jun 2020 • Yushi Hu, Shane Settle, Karen Livescu

The pre-trained models can then be used for unseen zero-resource languages, or fine-tuned on data from low-resource languages.

Dynamic Time Warping Retrieval +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.