Search Results for author: Linli Yao

Found 7 papers, 6 papers with code

LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?

1 code implementation • 16 Apr 2024 • Yuchi Wang, Shuhuai Ren, Rundong Gao, Linli Yao, Qingyan Guo, Kaikai An, Jianhong Bai, Xu sun

Diffusion models have exhibited remarkable capabilities in text-to-image generation.

Image Captioning Text Generation +1

Paper
Code

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

1 code implementation • 4 Dec 2023 • Shuhuai Ren, Linli Yao, Shicheng Li, Xu sun, Lu Hou

This work proposes TimeChat, a time-sensitive multimodal large language model specifically designed for long video understanding.

Dense Captioning Highlight Detection +5

180

Paper
Code

Edit As You Wish: Video Description Editing with Multi-grained Commands

no code implementations • 15 May 2023 • Linli Yao, Yuanmeng Zhang, Ziheng Wang, Xinglin Hou, Tiezheng Ge, Yuning Jiang, Qin Jin

In this paper, we propose a novel Video Description Editing (VDEdit) task to automatically revise an existing video description guided by flexible user requests.

Attribute Position +3

Paper
Add Code

Rethinking Benchmarks for Cross-modal Image-text Retrieval

1 code implementation • 21 Apr 2023 • Weijing Chen, Linli Yao, Qin Jin

The reason is that a large amount of images and texts in the benchmarks are coarse-grained.

Cross-Modal Retrieval Image-to-Text Retrieval +3

Paper
Code

CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained Knowledge

1 code implementation • 17 Nov 2022 • Linli Yao, Weijing Chen, Qin Jin

Automatically generating textual descriptions for massive unlabeled images on the web can greatly benefit realistic web applications, e. g. multimodal retrieval and recommendation.

Concept Alignment Retrieval

Paper
Code

Image Difference Captioning with Pre-training and Contrastive Learning

1 code implementation • 9 Feb 2022 • Linli Yao, Weiying Wang, Qin Jin

The Image Difference Captioning (IDC) task aims to describe the visual differences between two similar images with natural language.

Contrastive Learning Fine-Grained Image Classification

Paper
Code

YouMakeup VQA Challenge: Towards Fine-grained Action Understanding in Domain-Specific Videos

1 code implementation • 12 Apr 2020 • Shizhe Chen, Weiying Wang, Ludan Ruan, Linli Yao, Qin Jin

The goal of the YouMakeup VQA Challenge 2020 is to provide a common benchmark for fine-grained action understanding in domain-specific videos e. g. makeup instructional videos.

Action Understanding Question Answering +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.