Search Results for author: Qi Dai

Found 36 papers, 18 papers with code

MPII: Multi-Level Mutual Promotion for Inference and Interpretation

1 code implementation • ACL 2022 • Yan Liu, Sanyuan Chen, Yazheng Yang, Qi Dai

In this paper, we propose a multi-level Mutual Promotion mechanism for self-evolved Inference and sentence-level Interpretation (MPII).

Sentence

Paper
Code

Effectiveness of Self-Assessment Software to Evaluate Preclinical Operative Procedures

no code implementations • 8 Apr 2024 • Qi Dai, Ryan Davis, Houlin Hong, Ying Gu

Class II preparation at 400{\mu}m tolerance had the smallest mean difference of 0. 41 points.

Paper
Add Code

An edge detection-based deep learning approach for tear meniscus height measurement

no code implementations • 23 Mar 2024 • Kesheng Wang, Kunhui Xu, Xiaoyu Chen, Chunlei He, Jianfeng Zhang, Dexing Kong, Qi Dai, Shoujun Huang

For improved segmentation of the pupil and tear meniscus areas, the convolutional neural network Inceptionv3 was first implemented as an image quality assessment model, effectively identifying higher-quality images with an accuracy of 98. 224%.

Edge Detection Image Quality Assessment

Paper
Add Code

MotionEditor: Editing Video Motion via Content-Aware Diffusion

1 code implementation • 30 Nov 2023 • Shuyuan Tu, Qi Dai, Zhi-Qi Cheng, Han Hu, Xintong Han, Zuxuan Wu, Yu-Gang Jiang

This mechanism enables the editing branch to query the key and value from the reconstruction branch in a decoupled manner, making the editing branch retain the original background and protagonist appearance.

Video Editing

Paper
Code

ART$\boldsymbol{\cdot}$V: Auto-Regressive Text-to-Video Generation with Diffusion Models

no code implementations • 30 Nov 2023 • Wenming Weng, Ruoyu Feng, Yanhui Wang, Qi Dai, Chunyu Wang, Dacheng Yin, Zhiyuan Zhao, Kai Qiu, Jianmin Bao, Yuhui Yuan, Chong Luo, Yueyi Zhang, Zhiwei Xiong

Second, it preserves the high-fidelity generation ability of the pre-trained image diffusion models by making only minimal network modifications.

Text-to-Video Generation Video Generation

Paper
Add Code

VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models

no code implementations • 30 Nov 2023 • Zhen Xing, Qi Dai, Zihao Zhang, HUI ZHANG, Han Hu, Zuxuan Wu, Yu-Gang Jiang

Our model can edit and translate the desired results within seconds based on user instructions.

Semantic Segmentation Video Editing +3

Paper
Add Code

A Survey on Video Diffusion Models

1 code implementation • 16 Oct 2023 • Zhen Xing, Qijun Feng, Haoran Chen, Qi Dai, Han Hu, Hang Xu, Zuxuan Wu, Yu-Gang Jiang

However, existing surveys mainly focus on diffusion models in the context of image generation, with few up-to-date reviews on their application in the video domain.

Image Generation Video Editing +2

1,289

Paper
Code

SimDA: Simple Diffusion Adapter for Efficient Video Generation

no code implementations • 18 Aug 2023 • Zhen Xing, Qi Dai, Han Hu, Zuxuan Wu, Yu-Gang Jiang

In this work, we propose a Simple Diffusion Adapter (SimDA) that fine-tunes only 24M out of 1. 1B parameters of a strong T2I model, adapting it to video generation in a parameter-efficient way.

Transfer Learning Video Editing +2

Paper
Add Code

Implicit Temporal Modeling with Learnable Alignment for Video Recognition

1 code implementation • ICCV 2023 • Shuyuan Tu, Qi Dai, Zuxuan Wu, Zhi-Qi Cheng, Han Hu, Yu-Gang Jiang

While modeling temporal information within straight through tube is widely adopted in literature, we find that simple frame alignment already provides enough essence without temporal attention.

Ranked #16 on Action Classification on Kinetics-400

Action Classification Action Recognition +1

Paper
Code

ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules

1 code implementation • ICCV 2023 • Zhi-Qi Cheng, Qi Dai, SiYao Li, Jingdong Sun, Teruko Mitamura, Alexander G. Hauptmann

We evaluate ChartReader on Chart-to-Table, ChartQA, and Chart-to-Text tasks, demonstrating its superiority over existing methods.

Derendering Language Modelling +1

Paper
Code

Parallel Sentence-Level Explanation Generation for Real-World Low-Resource Scenarios

no code implementations • 21 Feb 2023 • Yan Liu, Xiaokang Chen, Qi Dai

However, current works pursuing sentence-level explanations rely heavily on annotated training data, which limits the development of interpretability to only a few tasks.

Explanation Generation Natural Language Inference +1

Paper
Add Code

All in Tokens: Unifying Output Space of Visual Tasks via Soft Token

1 code implementation • ICCV 2023 • Jia Ning, Chen Li, Zheng Zhang, Zigang Geng, Qi Dai, Kun He, Han Hu

With these new techniques and other designs, we show that the proposed general-purpose task-solver can perform both instance segmentation and depth estimation well.

Ranked #14 on Monocular Depth Estimation on NYU-Depth V2

Instance Segmentation Monocular Depth Estimation +1

Paper
Code

ResFormer: Scaling ViTs with Multi-Resolution Training

1 code implementation • CVPR 2023 • Rui Tian, Zuxuan Wu, Qi Dai, Han Hu, Yu Qiao, Yu-Gang Jiang

We introduce, ResFormer, a framework that is built upon the seminal idea of multi-resolution training for improved performance on a wide spectrum of, mostly unseen, testing resolutions.

Action Recognition Image Classification +4

Paper
Code

SVFormer: Semi-supervised Video Transformer for Action Recognition

1 code implementation • CVPR 2023 • Zhen Xing, Qi Dai, Han Hu, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang

In this paper, we investigate the use of transformer models under the SSL setting for action recognition.

Action Recognition Semi-Supervised Image Classification +1

Paper
Code

GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement

1 code implementation • 18 Aug 2022 • Zhi-Qi Cheng, Qi Dai, SiYao Li, Teruko Mitamura, Alexander G. Hauptmann

In the second stage, we exploit transformer layers to unearth the potential semantic relations within both verbs and semantic roles.

Grounded Situation Recognition Image Captioning +3

Paper
Code

Rethinking Spatial Invariance of Convolutional Networks for Object Counting

1 code implementation • CVPR 2022 • Zhi-Qi Cheng, Qi Dai, Hong Li, Jingkuan Song, Xiao Wu, Alexander G. Hauptmann

We evaluate our methods on 4 mainstream object counting networks (i. e., MCNN, CSRNet, SANet, and ResNet-50).

Ranked #1 on Object Counting on TRANCOS

Crowd Counting Object +2

Paper
Code

On Data Scaling in Masked Image Modeling

1 code implementation • CVPR 2023 • Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Yixuan Wei, Qi Dai, Han Hu

Our study reveals that: (i) Masked image modeling is also demanding on larger data.

Self-Supervised Learning

12,955

Paper
Code

HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling

1 code implementation • 30 May 2022 • Xiaosong Zhang, Yunjie Tian, Wei Huang, Qixiang Ye, Qi Dai, Lingxi Xie, Qi Tian

A key idea of efficient implementation is to discard the masked image patches (or tokens) throughout the target network (encoder), which requires the encoder to be a plain vision transformer (e. g., ViT), albeit hierarchical vision transformers (e. g., Swin Transformer) have potentially better properties in formulating vision inputs.

Transfer Learning

Paper
Code

Deeper Insights into the Robustness of ViTs towards Common Corruptions

no code implementations • 26 Apr 2022 • Rui Tian, Zuxuan Wu, Qi Dai, Han Hu, Yu-Gang Jiang

With Vision Transformers (ViTs) making great advances in a variety of computer vision tasks, recent literature have proposed various variants of vanilla ViTs to achieve better efficiency and efficacy.

Benchmarking Data Augmentation

Paper
Add Code

Multi-granularity Relabeled Under-sampling Algorithm for Imbalanced Data

no code implementations • 11 Jan 2022 • Qi Dai, Jian-wei Liu, Yang Liu

The Tomek-Link sampling algorithm can effectively reduce the class overlap on data, remove the majority instances that are difficult to distinguish, and improve the algorithm classification accuracy.

Classification imbalanced classification

Paper
Add Code

SimMIM: A Simple Framework for Masked Image Modeling

4 code implementations • CVPR 2022 • Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai, Han Hu

We also leverage this approach to facilitate the training of a 3B model (SwinV2-G), that by $40\times$ less data than that in previous practice, we achieve the state-of-the-art on four representative vision benchmarks.

Ranked #10 on Self-Supervised Image Classification on ImageNet (finetuned)

Representation Learning Self-Supervised Image Classification +1

869

Paper
Code

Cross-Modal Attention Consistency for Video-Audio Unsupervised Learning

no code implementations • 13 Jun 2021 • Shaobo Min, Qi Dai, Hongtao Xie, Chuang Gan, Yongdong Zhang, Jingdong Wang

Cross-modal correlation provides an inherent supervision for video unsupervised representation learning.

Contrastive Learning Representation Learning

Paper
Add Code

On the Connection between Local Attention and Dynamic Depth-wise Convolution

1 code implementation • ICLR 2022 • Qi Han, Zejia Fan, Qi Dai, Lei Sun, Ming-Ming Cheng, Jiaying Liu, Jingdong Wang

Sparse connectivity: there is no connection across channels, and each position is connected to the positions within a small local window.

object-detection Object Detection +2

179

Paper
Code

Self-Supervised Learning with Swin Transformers

6 code implementations • 10 May 2021 • Zhenda Xie, Yutong Lin, Zhuliang Yao, Zheng Zhang, Qi Dai, Yue Cao, Han Hu

We are witnessing a modeling shift from CNN to Transformers in computer vision.

Ranked #75 on Self-Supervised Image Classification on ImageNet

object-detection Object Detection +3

12,955

Paper
Code

Calibration of Human Driving Behavior and Preference Using Naturalistic Traffic Data

no code implementations • 5 May 2021 • Qi Dai, Di Shen, Jinhong Wang, Suzhou Huang, Dimitar Filev

Towards this end it is necessary that we have a comprehensive modeling framework for decision-making within which human driving preferences can be inferred statistically from observed driving behaviors in realistic and naturalistic traffic settings.

Autonomous Vehicles Decision Making

Paper
Add Code

Learning to Estimate Kernel Scale and Orientation of Defocus Blur with Asymmetric Coded Aperture

no code implementations • 10 Mar 2021 • Jisheng Li, Qi Dai, Jiangtao Wen

Consistent in-focus input imagery is an essential precondition for machine vision systems to perceive the dynamic environment.

Paper
Add Code

Temporal Action Detection with Multi-level Supervision

no code implementations • ICCV 2021 • Baifeng Shi, Qi Dai, Judy Hoffman, Kate Saenko, Trevor Darrell, Huijuan Xu

We extensively benchmark against the baselines for SSAD and OSAD on our created data splits in THUMOS14 and ActivityNet1. 2, and demonstrate the effectiveness of the proposed UFA and IB methods.

Action Detection Semi-Supervised Action Detection

Paper
Add Code

Towards a Systematic Computational Framework for Modeling Multi-Agent Decision-Making at Micro Level for Smart Vehicles in a Smart World

no code implementations • 25 Sep 2020 • Qi Dai, Xunnong Xu, Wen Guo, Suzhou Huang, Dimitar Filev

To demonstrate how our approach can be applied to realistic traffic settings, we conduct a simulation experiment: to derive merging and yielding behaviors on a double-lane highway with an unexpected barrier.

Autonomous Vehicles Computational Efficiency +1

Paper
Add Code

Informative Dropout for Robust Representation Learning: A Shape-bias Perspective

1 code implementation • ICML 2020 • Baifeng Shi, Dinghuai Zhang, Qi Dai, Zhanxing Zhu, Yadong Mu, Jingdong Wang

Specifically, we discriminate texture from shape based on local self-information in an image, and adopt a Dropout-like algorithm to decorrelate the model output from the local texture.

Domain Generalization Representation Learning

125

Paper
Code

Reinforcing Short-Length Hashing

no code implementations • 24 Apr 2020 • Xingbo Liu, Xiushan Nie, Qi Dai, Yupan Huang, Yilong Yin

Due to the compelling efficiency in retrieval and storage, similarity-preserving hashing has been widely applied to approximate nearest neighbor search in large-scale image retrieval.

Image Retrieval Retrieval

Paper
Add Code

Weakly-Supervised Action Localization by Generative Attention Modeling

1 code implementation • CVPR 2020 • Baifeng Shi, Qi Dai, Yadong Mu, Jingdong Wang

By maximizing the conditional probability with respect to the attention, the action and non-action frames are well separated.

Ranked #8 on Weakly Supervised Action Localization on ActivityNet-1.2

Weakly Supervised Action Localization Weakly-supervised Temporal Action Localization +1

136

Paper
Code

Self-supervised Object Motion and Depth Estimation from Video

no code implementations • 9 Dec 2019 • Qi Dai, Vaishakh Patil, Simon Hecker, Dengxin Dai, Luc van Gool, Konrad Schindler

We present a self-supervised learning framework to estimate the individual object motion and monocular depth from video.

Depth Estimation Instance Segmentation +5

Paper
Add Code

Improving the Learning of Multi-column Convolutional Neural Network for Crowd Counting

no code implementations • 17 Sep 2019 • Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, Jun-Yan He, Alexander Hauptmann

By minimizing the mutual information, each column is guided to learn features with different image scales.

Crowd Counting

Paper
Add Code

Learning Spatial Awareness to Improve Crowd Counting

no code implementations • ICCV 2019 • Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, Alexander Hauptmann

Although the Maximum Excess over SubArrays (MESA) loss has been previously proposed to address the above issues by finding the rectangular subregion whose predicted density map has the maximum difference from the ground truth, it cannot be solved by gradient descent, thus can hardly be integrated into the deep learning framework.

Ranked #5 on Crowd Counting on WorldExpo’10

Crowd Counting Weakly-supervised Learning

Paper
Add Code

Decoupling Localization and Classification in Single Shot Temporal Action Detection

1 code implementation • 16 Apr 2019 • Yupan Huang, Qi Dai, Yutong Lu

Each branch produces a set of action anchor layers by applying deconvolution to the feature maps of the main stream.

Ranked #26 on Temporal Action Localization on THUMOS’14

Action Detection Classification +2

Paper
Code

Recurrent Tubelet Proposal and Recognition Networks for Action Detection

no code implementations • ECCV 2018 • Dong Li, Zhaofan Qiu, Qi Dai, Ting Yao, Tao Mei

The RTP initializes action proposals of the start frame through a Region Proposal Network and then estimates the movements of proposals in next frame in a recurrent manner.

Action Detection Region Proposal

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.