Search Results for author: Zhiwu Qing

Found 27 papers, 20 papers with code

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

1 code implementation25 Dec 2023 Xiang Wang, Shiwei Zhang, Hangjie Yuan, Zhiwu Qing, Biao Gong, Yingya Zhang, Yujun Shen, Changxin Gao, Nong Sang

Following such a pipeline, we study the effect of doubling the scale of training set (i. e., video-only WebVid10M) with some randomly collected text-free videos and are encouraged to observe the performance improvement (FID from 9. 67 to 8. 19 and FVD from 484 to 441), demonstrating the scalability of our approach.

Text-to-Image Generation Text-to-Video Generation +2

DreamVideo: Composing Your Dream Videos with Customized Subject and Motion

1 code implementation7 Dec 2023 Yujie Wei, Shiwei Zhang, Zhiwu Qing, Hangjie Yuan, Zhiheng Liu, Yu Liu, Yingya Zhang, Jingren Zhou, Hongming Shan

In motion learning, we architect a motion adapter and fine-tune it on the given videos to effectively model the target motion pattern.

Image Generation Video Generation

Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation

1 code implementation7 Dec 2023 Zhiwu Qing, Shiwei Zhang, Jiayu Wang, Xiang Wang, Yujie Wei, Yingya Zhang, Changxin Gao, Nong Sang

At the structure level, we decompose the T2V task into two steps, including spatial reasoning and temporal reasoning, using a unified denoiser.

Text-to-Video Generation Video Generation

Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning

1 code implementation ICCV 2023 Zhiwu Qing, Shiwei Zhang, Ziyuan Huang, Yingya Zhang, Changxin Gao, Deli Zhao, Nong Sang

When pre-training on the large-scale Kinetics-710, we achieve 89. 7% on Kinetics-400 with a frozen ViT-L model, which verifies the scalability of DiST.

Transfer Learning Video Recognition

MoLo: Motion-augmented Long-short Contrastive Learning for Few-shot Action Recognition

1 code implementation CVPR 2023 Xiang Wang, Shiwei Zhang, Zhiwu Qing, Changxin Gao, Yingya Zhang, Deli Zhao, Nong Sang

To address these issues, we develop a Motion-augmented Long-short Contrastive Learning (MoLo) method that contains two crucial components, including a long-short contrastive objective and a motion autodecoder.

Contrastive Learning Few-Shot action recognition +1

Enlarging Instance-specific and Class-specific Information for Open-set Action Recognition

1 code implementation CVPR 2023 Jun Cen, Shiwei Zhang, Xiang Wang, Yixuan Pei, Zhiwu Qing, Yingya Zhang, Qifeng Chen

In this paper, we begin with analyzing the feature representation behavior in the open-set action recognition (OSAR) problem based on the information bottleneck (IB) theory, and propose to enlarge the instance-specific (IS) and class-specific (CS) information contained in the feature for better performance.

Open Set Action Recognition

Space-time Prompting for Video Class-incremental Learning

no code implementations ICCV 2023 Yixuan Pei, Zhiwu Qing, Shiwei Zhang, Xiang Wang, Yingya Zhang, Deli Zhao, Xueming Qian

In this paper, we will fill this gap by learning multiple prompts based on a powerful image-language pre-trained model, i. e., CLIP, making it fit for video class-incremental learning (VCIL).

Class Incremental Learning Incremental Learning

Learning a Condensed Frame for Memory-Efficient Video Class-Incremental Learning

no code implementations2 Nov 2022 Yixuan Pei, Zhiwu Qing, Jun Cen, Xiang Wang, Shiwei Zhang, Yaxiong Wang, Mingqian Tang, Nong Sang, Xueming Qian

The former is to reduce the memory cost by preserving only one condensed frame instead of the whole video, while the latter aims to compensate the lost spatio-temporal details in the Frame Condensing stage.

Action Recognition Class Incremental Learning +1

MAR: Masked Autoencoders for Efficient Action Recognition

1 code implementation24 Jul 2022 Zhiwu Qing, Shiwei Zhang, Ziyuan Huang, Xiang Wang, Yuehuan Wang, Yiliang Lv, Changxin Gao, Nong Sang

Inspired by this, we propose propose Masked Action Recognition (MAR), which reduces the redundant computation by discarding a proportion of patches and operating only on a part of the videos.

Action Classification Action Recognition +1

Hybrid Relation Guided Set Matching for Few-shot Action Recognition

1 code implementation CVPR 2022 Xiang Wang, Shiwei Zhang, Zhiwu Qing, Mingqian Tang, Zhengrong Zuo, Changxin Gao, Rong Jin, Nong Sang

To overcome the two limitations, we propose a novel Hybrid Relation guided Set Matching (HyRSM) approach that incorporates two key components: hybrid relation module and set matching metric.

Few Shot Action Recognition Relation +1

TAda! Temporally-Adaptive Convolutions for Video Understanding

2 code implementations ICLR 2022 Ziyuan Huang, Shiwei Zhang, Liang Pan, Zhiwu Qing, Mingqian Tang, Ziwei Liu, Marcelo H. Ang Jr

This work presents Temporally-Adaptive Convolutions (TAdaConv) for video understanding, which shows that adaptive weight calibration along the temporal dimension is an efficient way to facilitate modelling complex temporal dynamics in videos.

Ranked #67 on Action Recognition on Something-Something V2 (using extra training data)

Action Classification Action Recognition +2

ParamCrop: Parametric Cubic Cropping for Video Contrastive Learning

1 code implementation24 Aug 2021 Zhiwu Qing, Ziyuan Huang, Shiwei Zhang, Mingqian Tang, Changxin Gao, Marcelo H. Ang Jr, Rong Jin, Nong Sang

The visualizations show that ParamCrop adaptively controls the center distance and the IoU between two augmented views, and the learned change in the disparity along the training process is beneficial to learning a strong representation.

Contrastive Learning

OadTR: Online Action Detection with Transformers

1 code implementation ICCV 2021 Xiang Wang, Shiwei Zhang, Zhiwu Qing, Yuanjie Shao, Zhengrong Zuo, Changxin Gao, Nong Sang

Most recent approaches for online action detection tend to apply Recurrent Neural Network (RNN) to capture long-range temporal structure.

Online Action Detection

Weakly-Supervised Temporal Action Localization Through Local-Global Background Modeling

no code implementations20 Jun 2021 Xiang Wang, Zhiwu Qing, Ziyuan Huang, Yutong Feng, Shiwei Zhang, Jianwen Jiang, Mingqian Tang, Yuanjie Shao, Nong Sang

Then our proposed Local-Global Background Modeling Network (LGBM-Net) is trained to localize instances by using only video-level labels based on Multi-Instance Learning (MIL).

Weakly-supervised Learning Weakly-supervised Temporal Action Localization +1

Relation Modeling in Spatio-Temporal Action Localization

no code implementations15 Jun 2021 Yutong Feng, Jianwen Jiang, Ziyuan Huang, Zhiwu Qing, Xiang Wang, Shiwei Zhang, Mingqian Tang, Yue Gao

This paper presents our solution to the AVA-Kinetics Crossover Challenge of ActivityNet workshop at CVPR 2021.

Ranked #4 on Spatio-Temporal Action Localization on AVA-Kinetics (using extra training data)

Action Detection Relation +2

A Stronger Baseline for Ego-Centric Action Detection

1 code implementation13 Jun 2021 Zhiwu Qing, Ziyuan Huang, Xiang Wang, Yutong Feng, Shiwei Zhang, Jianwen Jiang, Mingqian Tang, Changxin Gao, Marcelo H. Ang Jr, Nong Sang

This technical report analyzes an egocentric video action detection method we used in the 2021 EPIC-KITCHENS-100 competition hosted in CVPR2021 Workshop.

Action Detection

Temporal Context Aggregation Network for Temporal Action Proposal Refinement

1 code implementation CVPR 2021 Zhiwu Qing, Haisheng Su, Weihao Gan, Dongliang Wang, Wei Wu, Xiang Wang, Yu Qiao, Junjie Yan, Changxin Gao, Nong Sang

In this paper, we propose Temporal Context Aggregation Network (TCANet) to generate high-quality action proposals through "local and global" temporal context aggregation and complementary as well as progressive boundary refinement.

Action Detection Retrieval +2

Temporal Fusion Network for Temporal Action Localization:Submission to ActivityNet Challenge 2020 (Task E)

no code implementations13 Jun 2020 Zhiwu Qing, Xiang Wang, Yongpeng Sang, Changxin Gao, Shiwei Zhang, Nong Sang

This technical report analyzes a temporal action localization method we used in the HACS competition which is hosted in Activitynet Challenge 2020. The goal of our task is to locate the start time and end time of the action in the untrimmed video, and predict action category. Firstly, we utilize the video-level feature information to train multiple video-level action classification models.

Action Classification Temporal Action Localization

Cannot find the paper you are looking for? You can Submit a new open access paper.