Search Results for author: Yushi Hu

Found 15 papers, 8 papers with code

BLINK: Multimodal Large Language Models Can See but Not Perceive

no code implementations18 Apr 2024 Xingyu Fu, Yushi Hu, Bangzheng Li, Yu Feng, Haoyu Wang, Xudong Lin, Dan Roth, Noah A. Smith, Wei-Chiu Ma, Ranjay Krishna

We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses on core visual perception abilities not found in other evaluations.

Depth Estimation Multiple-choice +1

Training Language Models to Generate Text with Citations via Fine-grained Rewards

no code implementations6 Feb 2024 Chengyu Huang, Zeqiu Wu, Yushi Hu, Wenya Wang

While recent Large Language Models (LLMs) have proven useful in answering user queries, they are prone to hallucination, and their responses often lack credibility due to missing references to reliable sources.

Hallucination Question Answering

Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models

no code implementations5 Dec 2023 Yushi Hu, Otilia Stretcu, Chun-Ta Lu, Krishnamurthy Viswanathan, Kenji Hata, Enming Luo, Ranjay Krishna, Ariel Fuxman

We propose Visual Program Distillation (VPD), an instruction tuning framework that produces a vision-language model (VLM) capable of solving complex visual tasks with a single forward pass.

Language Modelling Large Language Model +3

DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback

no code implementations29 Nov 2023 Jiao Sun, Deqing Fu, Yushi Hu, Su Wang, Royi Rassin, Da-Cheng Juan, Dana Alon, Charles Herrmann, Sjoerd van Steenkiste, Ranjay Krishna, Cyrus Rashtchian

Then, it uses two VLMs to select the best generation: a Visual Question Answering model that measures the alignment of generated images to the text, and another that measures the generation's aesthetic quality.

Question Answering Text-to-Image Generation +1

Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation

no code implementations27 Oct 2023 Jaemin Cho, Yushi Hu, Roopal Garg, Peter Anderson, Ranjay Krishna, Jason Baldridge, Mohit Bansal, Jordi Pont-Tuset, Su Wang

With extensive experimentation and human evaluation on a range of model configurations (LLM, VQA, and T2I), we empirically demonstrate that DSG addresses the challenges noted above.

Question Answering Question Generation +3

Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

no code implementations NeurIPS 2023 Zeqiu Wu, Yushi Hu, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A. Smith, Mari Ostendorf, Hannaneh Hajishirzi

We introduce Fine-Grained RLHF, a framework that enables training and learning from reward functions that are fine-grained in two respects: (1) density, providing a reward after every segment (e. g., a sentence) is generated; and (2) incorporating multiple reward models associated with different feedback types (e. g., factual incorrectness, irrelevance, and information incompleteness).

Language Modelling Long Form Question Answering +2

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

1 code implementation ICCV 2023 Yushi Hu, Benlin Liu, Jungo Kasai, Yizhong Wang, Mari Ostendorf, Ranjay Krishna, Noah A Smith

We introduce TIFA (Text-to-Image Faithfulness evaluation with question Answering), an automatic evaluation metric that measures the faithfulness of a generated image to its text input via visual question answering (VQA).

4k Language Modelling +4

PromptCap: Prompt-Guided Image Captioning for VQA with GPT-3

no code implementations ICCV 2023 Yushi Hu, Hang Hua, Zhengyuan Yang, Weijia Shi, Noah A. Smith, Jiebo Luo

PromptCap outperforms generic captions by a large margin and achieves state-of-the-art accuracy on knowledge-based VQA tasks (60. 4% on OK-VQA and 59. 6% on A-OKVQA).

Image Captioning Question Answering +3

One Embedder, Any Task: Instruction-Finetuned Text Embeddings

3 code implementations19 Dec 2022 Hongjin Su, Weijia Shi, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Wen-tau Yih, Noah A. Smith, Luke Zettlemoyer, Tao Yu

Our analysis suggests that INSTRUCTOR is robust to changes in instructions, and that instruction finetuning mitigates the challenge of training a single model on diverse datasets.

Information Retrieval Learning Word Embeddings +3

PromptCap: Prompt-Guided Task-Aware Image Captioning

1 code implementation15 Nov 2022 Yushi Hu, Hang Hua, Zhengyuan Yang, Weijia Shi, Noah A Smith, Jiebo Luo

PromptCap outperforms generic captions by a large margin and achieves state-of-the-art accuracy on knowledge-based VQA tasks (60. 4% on OK-VQA and 59. 6% on A-OKVQA).

Image Captioning Language Modelling +5

Binding Language Models in Symbolic Languages

1 code implementation6 Oct 2022 Zhoujun Cheng, Tianbao Xie, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Tao Yu

We propose Binder, a training-free neural-symbolic framework that maps the task input to a program, which (1) allows binding a unified API of language model (LM) functionalities to a programming language (e. g., SQL, Python) to extend its grammar coverage and thus tackle more diverse questions, (2) adopts an LM as both the program parser and the underlying model called by the API during execution, and (3) requires only a few in-context exemplar annotations.

Language Modelling Semantic Parsing +1

Unsupervised Learning of Hierarchical Conversation Structure

1 code implementation24 May 2022 Bo-Ru Lu, Yushi Hu, Hao Cheng, Noah A. Smith, Mari Ostendorf

Human conversations can evolve in many different ways, creating challenges for automatic understanding and summarization.

In-Context Learning for Few-Shot Dialogue State Tracking

1 code implementation16 Mar 2022 Yushi Hu, Chia-Hsuan Lee, Tianbao Xie, Tao Yu, Noah A. Smith, Mari Ostendorf

In this work, we propose an in-context learning (ICL) framework for zero-shot and few-shot learning DST, where a large pre-trained language model (LM) takes a test instance and a few exemplars as input, and directly decodes the dialogue state without any parameter updates.

Dialogue State Tracking Few-Shot Learning +3

Acoustic span embeddings for multilingual query-by-example search

1 code implementation24 Nov 2020 Yushi Hu, Shane Settle, Karen Livescu

In this work, we generalize AWE training to spans of words, producing acoustic span embeddings (ASE), and explore the application of ASE to QbE with arbitrary-length queries in multiple unseen languages.

Dynamic Time Warping Word Embeddings

Multilingual Jointly Trained Acoustic and Written Word Embeddings

1 code implementation24 Jun 2020 Yushi Hu, Shane Settle, Karen Livescu

The pre-trained models can then be used for unseen zero-resource languages, or fine-tuned on data from low-resource languages.

Dynamic Time Warping Retrieval +1

Cannot find the paper you are looking for? You can Submit a new open access paper.