Search Results for author: Chaorui Deng

Found 11 papers, 6 papers with code

Likelihood-Based Text-to-Image Evaluation with Patch-Level Perceptual and Semantic Credit Assignment

1 code implementation • 16 Aug 2023 • Qi Chen, Chaorui Deng, Zixiong Huang, BoWen Zhang, Mingkui Tan, Qi Wu

In this paper, we propose to evaluate text-to-image generation performance by directly estimating the likelihood of the generated images using a pre-trained likelihood-based text-to-image generative model, i. e., a higher likelihood indicates better perceptual quality and better text-image alignment.

Text-to-Image Generation

Paper
Code

Identity-Consistent Aggregation for Video Object Detection

1 code implementation • ICCV 2023 • Chaorui Deng, Da Chen, Qi Wu

In Video Object Detection (VID), a common practice is to leverage the rich temporal contexts from the video to enhance the object representations in each frame.

Ranked #1 on Video Object Detection on ImageNet VID (MAP metric)

Object object-detection +1

Paper
Code

Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval

1 code implementation • ICCV 2023 • Chaorui Deng, Qi Chen, Pengda Qin, Da Chen, Qi Wu

In text-video retrieval, recent works have benefited from the powerful learning capabilities of pre-trained text-image foundation models (e. g., CLIP) by adapting them to the video domain.

Retrieval Video Captioning +1

Paper
Code

Learning Distinct and Representative Styles for Image Captioning

1 code implementation • 17 Sep 2022 • Qi Chen, Chaorui Deng, Qi Wu

Our innovative idea is to explore the rich modes in the training caption corpus to learn a set of "mode embeddings", and further use them to control the mode of the generated captions for existing image captioning models.

Image Captioning Word Embeddings

Paper
Code

Sketch, Ground, and Refine: Top-Down Dense Video Captioning

no code implementations • CVPR 2021 • Chaorui Deng, ShiZhe Chen, Da Chen, Yuan He, Qi Wu

The dense video captioning task aims to detect and describe a sequence of events in a video for detailed and coherent storytelling.

Dense Video Captioning Sentence

Paper
Add Code

Double Forward Propagation for Memorized Batch Normalization

no code implementations • 10 Oct 2020 • Yong Guo, Qingyao Wu, Chaorui Deng, Jian Chen, Mingkui Tan

Although the standard BN can significantly accelerate the training of DNNs and improve the generalization performance, it has several underlying limitations which may hamper the performance in both training and inference.

Paper
Add Code

Referring Expression Comprehension: A Survey of Methods and Datasets

no code implementations • 19 Jul 2020 • Yanyuan Qiao, Chaorui Deng, Qi Wu

In this survey, we first examine the state of the art by comparing modern approaches to the problem.

object-detection Object Detection +2

Paper
Add Code

Length-Controllable Image Captioning

1 code implementation • ECCV 2020 • Chaorui Deng, Ning Ding, Mingkui Tan, Qi Wu

We verify the merit of the proposed length level embedding on three models: two state-of-the-art (SOTA) autoregressive models with different types of decoder, as well as our proposed non-autoregressive model, to show its generalization ability.

controllable image captioning

Paper
Code

Deep High-Resolution Representation Learning for Visual Recognition

42 code implementations • 20 Aug 2019 • Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Wenyu Liu, Bin Xiao

High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection.

Ranked #1 on Object Detection on COCO test-dev (Hardware Burden metric)

Dichotomous Image Segmentation Face Alignment +7

27,744

Paper
Code

You Only Look & Listen Once: Towards Fast and Accurate Visual Grounding

no code implementations • 12 Feb 2019 • Chaorui Deng, Qi Wu, Guanghui Xu, Zhuliang Yu, Yanwu Xu, Kui Jia, Mingkui Tan

Most state-of-the-art methods in VG operate in a two-stage manner, wherein the first stage an object detector is adopted to generate a set of object proposals from the input image and the second stage is simply formulated as a cross-modal matching problem that finds the best match between the language query and all region proposals.

object-detection Object Detection +2

Paper
Add Code

Visual Grounding via Accumulated Attention

no code implementations • CVPR 2018 • Chaorui Deng, Qi Wu, Qingyao Wu, Fuyuan Hu, Fan Lyu, Mingkui Tan

There are three main challenges in VG: 1) what is the main focus in a query; 2) how to understand an image; 3) how to locate an object.

Sentence Visual Grounding

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.