Search Results for author: Chaorui Deng

Found 11 papers, 6 papers with code

Likelihood-Based Text-to-Image Evaluation with Patch-Level Perceptual and Semantic Credit Assignment

1 code implementation16 Aug 2023 Qi Chen, Chaorui Deng, Zixiong Huang, BoWen Zhang, Mingkui Tan, Qi Wu

In this paper, we propose to evaluate text-to-image generation performance by directly estimating the likelihood of the generated images using a pre-trained likelihood-based text-to-image generative model, i. e., a higher likelihood indicates better perceptual quality and better text-image alignment.

Text-to-Image Generation

Identity-Consistent Aggregation for Video Object Detection

1 code implementation ICCV 2023 Chaorui Deng, Da Chen, Qi Wu

In Video Object Detection (VID), a common practice is to leverage the rich temporal contexts from the video to enhance the object representations in each frame.

Object object-detection +1

Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval

1 code implementation ICCV 2023 Chaorui Deng, Qi Chen, Pengda Qin, Da Chen, Qi Wu

In text-video retrieval, recent works have benefited from the powerful learning capabilities of pre-trained text-image foundation models (e. g., CLIP) by adapting them to the video domain.

Retrieval Video Captioning +1

Learning Distinct and Representative Styles for Image Captioning

1 code implementation17 Sep 2022 Qi Chen, Chaorui Deng, Qi Wu

Our innovative idea is to explore the rich modes in the training caption corpus to learn a set of "mode embeddings", and further use them to control the mode of the generated captions for existing image captioning models.

Image Captioning Word Embeddings

Sketch, Ground, and Refine: Top-Down Dense Video Captioning

no code implementations CVPR 2021 Chaorui Deng, ShiZhe Chen, Da Chen, Yuan He, Qi Wu

The dense video captioning task aims to detect and describe a sequence of events in a video for detailed and coherent storytelling.

Dense Video Captioning Sentence

Double Forward Propagation for Memorized Batch Normalization

no code implementations10 Oct 2020 Yong Guo, Qingyao Wu, Chaorui Deng, Jian Chen, Mingkui Tan

Although the standard BN can significantly accelerate the training of DNNs and improve the generalization performance, it has several underlying limitations which may hamper the performance in both training and inference.

Referring Expression Comprehension: A Survey of Methods and Datasets

no code implementations19 Jul 2020 Yanyuan Qiao, Chaorui Deng, Qi Wu

In this survey, we first examine the state of the art by comparing modern approaches to the problem.

object-detection Object Detection +2

Length-Controllable Image Captioning

1 code implementation ECCV 2020 Chaorui Deng, Ning Ding, Mingkui Tan, Qi Wu

We verify the merit of the proposed length level embedding on three models: two state-of-the-art (SOTA) autoregressive models with different types of decoder, as well as our proposed non-autoregressive model, to show its generalization ability.

controllable image captioning

Deep High-Resolution Representation Learning for Visual Recognition

42 code implementations20 Aug 2019 Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Wenyu Liu, Bin Xiao

High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection.

 Ranked #1 on Object Detection on COCO test-dev (Hardware Burden metric)

Dichotomous Image Segmentation Face Alignment +7

You Only Look & Listen Once: Towards Fast and Accurate Visual Grounding

no code implementations12 Feb 2019 Chaorui Deng, Qi Wu, Guanghui Xu, Zhuliang Yu, Yanwu Xu, Kui Jia, Mingkui Tan

Most state-of-the-art methods in VG operate in a two-stage manner, wherein the first stage an object detector is adopted to generate a set of object proposals from the input image and the second stage is simply formulated as a cross-modal matching problem that finds the best match between the language query and all region proposals.

object-detection Object Detection +2

Visual Grounding via Accumulated Attention

no code implementations CVPR 2018 Chaorui Deng, Qi Wu, Qingyao Wu, Fuyuan Hu, Fan Lyu, Mingkui Tan

There are three main challenges in VG: 1) what is the main focus in a query; 2) how to understand an image; 3) how to locate an object.

Sentence Visual Grounding

Cannot find the paper you are looking for? You can Submit a new open access paper.