no code implementations • 24 Feb 2024 • Ying Shen, Zhiyang Xu, Qifan Wang, Yu Cheng, Wenpeng Yin, Lifu Huang
Multimodal Large Language Models (MLLMs) have demonstrated remarkable proficiency in diverse tasks across different domains, with an increasing focus on improving their zero-shot generalization capabilities for unseen multimodal tasks.
no code implementations • 18 Feb 2024 • Zhiyang Xu, Chao Feng, Rulin Shao, Trevor Ashby, Ying Shen, Di Jin, Yu Cheng, Qifan Wang, Lifu Huang
Despite vision-language models' (VLMs) remarkable capabilities as versatile visual assistants, two substantial challenges persist within the existing VLM frameworks: (1) lacking task diversity in pretraining and visual instruction tuning, and (2) annotation error and bias in GPT-4 synthesized instruction tuning data.
no code implementations • 15 Nov 2023 • Minqian Liu, Ying Shen, Zhiyang Xu, Yixin Cao, Eunah Cho, Vaibhav Kumar, Reza Ghanadan, Lifu Huang
Natural Language Generation (NLG) typically involves evaluating the generated text in various aspects (e. g., consistency and naturalness) to obtain a comprehensive assessment.
1 code implementation • 8 Oct 2023 • Jingyuan Qi, Minqian Liu, Ying Shen, Zhiyang Xu, Lifu Huang
Automatically generating scripts (i. e. sequences of key steps described in text) from video demonstrations and reasoning about the subsequent steps are crucial to the modern AI virtual assistants to guide humans to complete everyday tasks, especially unfamiliar ones.
no code implementations • 24 May 2023 • Barry Menglong Yao, Yu Chen, Qifan Wang, Sijia Wang, Minqian Liu, Zhiyang Xu, Licheng Yu, Lifu Huang
We propose attribute-aware multimodal entity linking, where the input is a mention described with a text and image, and the goal is to predict the corresponding target entity from a multimodal knowledge base (KB) where each entity is also described with a text description, a visual image and a set of attributes and values.
no code implementations • 24 May 2023 • Xiaochu Li, Minqian Liu, Zhiyang Xu, Lifu Huang
To solve these challenges, we propose joint biomedical entity linking and event extraction by regarding the event structures and entity references in knowledge bases as latent variables and updating the two task-specific models in a hard Expectation-Maximization (EM) fashion: (1) predicting the missing variables for each partially annotated dataset based on the current two task-specific models, and (2) updating the parameters of each model on the corresponding pseudo completed dataset.
1 code implementation • 24 May 2023 • Jingyuan Qi, Zhiyang Xu, Ying Shen, Minqian Liu, Di Jin, Qifan Wang, Lifu Huang
Chain-of-Thought (CoT) prompting enables large language models to solve complex reasoning problems by generating intermediate steps.
1 code implementation • 21 Dec 2022 • Zhiyang Xu, Ying Shen, Lifu Huang
Our results indicate that fine-tuning the model on a diverse set of tasks and instructions leads to a reduced sensitivity to variations in instructions for each task.
no code implementations • 25 May 2022 • Zhiyang Xu, Jay-Yoon Lee, Lifu Huang
Data scarcity has been the main factor that hinders the progress of event extraction.
no code implementations • 15 Apr 2022 • Apoorv Garg, Deval Srivastava, Zhiyang Xu, Lifu Huang
Due to the superior performance, large-scale pre-trained language models (PLMs) have been widely adopted in many aspects of human society.
1 code implementation • EMNLP 2021 • Zhiyang Xu, Andrew Drozdov, Jay Yoon Lee, Tim O'Gorman, Subendhu Rongali, Dylan Finkbeiner, Shilpa Suresh, Mohit Iyyer, Andrew McCallum
For over thirty years, researchers have developed and analyzed methods for latent tree induction as an approach for unsupervised syntactic parsing.
no code implementations • 19 Mar 2021 • Zhongyang Zhang, Zhiyang Xu, Zia Ahmed, Asif Salekin, Tauhidur Rahman
However, one of the fundamental limitations of these approaches is that they are highly dependent on image and camera settings and can only learn to map an input HSI with one specific setting to an output HSI with another.
Hyperspectral Image Super-Resolution Image Super-Resolution +1
1 code implementation • AKBC 2020 • Dung Thai, Zhiyang Xu, Nicholas Monath, Boris Veytsman, Andrew McCallum
In this paper, we describe a technique for using BibTeX to generate, automatically, a large-scale 41M labeled strings), labeled dataset, that is four orders of magnitude larger than the current largest CFE dataset, namely the UMass Citation Field Extraction dataset [Anzaroot and McCallum, 2013].