Search Results for author: Zhiwu Qing

Found 27 papers, 20 papers with code

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

1 code implementation • 25 Dec 2023 • Xiang Wang, Shiwei Zhang, Hangjie Yuan, Zhiwu Qing, Biao Gong, Yingya Zhang, Yujun Shen, Changxin Gao, Nong Sang

Following such a pipeline, we study the effect of doubling the scale of training set (i. e., video-only WebVid10M) with some randomly collected text-free videos and are encouraged to observe the performance improvement (FID from 9. 67 to 8. 19 and FVD from 484 to 441), demonstrating the scalability of our approach.

Ranked #6 on Text-to-Video Generation on MSR-VTT

Text-to-Image Generation Text-to-Video Generation +2

Paper
Code

DreamVideo: Composing Your Dream Videos with Customized Subject and Motion

1 code implementation • 7 Dec 2023 • Yujie Wei, Shiwei Zhang, Zhiwu Qing, Hangjie Yuan, Zhiheng Liu, Yu Liu, Yingya Zhang, Jingren Zhou, Hongming Shan

In motion learning, we architect a motion adapter and fine-tune it on the given videos to effectively model the target motion pattern.

Image Generation Video Generation

2,613

Paper
Code

Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation

1 code implementation • 7 Dec 2023 • Zhiwu Qing, Shiwei Zhang, Jiayu Wang, Xiang Wang, Yujie Wei, Yingya Zhang, Changxin Gao, Nong Sang

At the structure level, we decompose the T2V task into two steps, including spatial reasoning and temporal reasoning, using a unified denoiser.

Ranked #5 on Text-to-Video Generation on MSR-VTT

Text-to-Video Generation Video Generation

2,613

Paper
Code

Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning

1 code implementation • ICCV 2023 • Zhiwu Qing, Shiwei Zhang, Ziyuan Huang, Yingya Zhang, Changxin Gao, Deli Zhao, Nong Sang

When pre-training on the large-scale Kinetics-710, we achieve 89. 7% on Kinetics-400 with a frozen ViT-L model, which verifies the scalability of DiST.

Transfer Learning Video Recognition

Paper
Code

HR-Pro: Point-supervised Temporal Action Localization via Hierarchical Reliability Propagation

1 code implementation • 24 Aug 2023 • Huaxin Zhang, Xiang Wang, Xiaohao Xu, Zhiwu Qing, Changxin Gao, Nong Sang

For snippet-level learning, we introduce an online-updated memory to store reliable snippet prototypes for each class.

Ranked #1 on Weakly Supervised Action Localization on BEOID

Weakly Supervised Action Localization

Paper
Code

Temporally-Adaptive Models for Efficient Video Understanding

1 code implementation • 10 Aug 2023 • Ziyuan Huang, Shiwei Zhang, Liang Pan, Zhiwu Qing, Yingya Zhang, Ziwei Liu, Marcelo H. Ang Jr

Spatial convolutions are extensively used in numerous deep video models.

Ranked #3 on Action Recognition on EPIC-KITCHENS-100 (using extra training data)

Action Classification Action Recognition +1

216

Paper
Code

MoLo: Motion-augmented Long-short Contrastive Learning for Few-shot Action Recognition

1 code implementation • CVPR 2023 • Xiang Wang, Shiwei Zhang, Zhiwu Qing, Changxin Gao, Yingya Zhang, Deli Zhao, Nong Sang

To address these issues, we develop a Motion-augmented Long-short Contrastive Learning (MoLo) method that contains two crucial components, including a long-short contrastive objective and a motion autodecoder.

Contrastive Learning Few-Shot action recognition +1

Paper
Code

Enlarging Instance-specific and Class-specific Information for Open-set Action Recognition

1 code implementation • CVPR 2023 • Jun Cen, Shiwei Zhang, Xiang Wang, Yixuan Pei, Zhiwu Qing, Yingya Zhang, Qifeng Chen

In this paper, we begin with analyzing the feature representation behavior in the open-set action recognition (OSAR) problem based on the information bottleneck (IB) theory, and propose to enlarge the instance-specific (IS) and class-specific (CS) information contained in the feature for better performance.

Open Set Action Recognition

Paper
Code

HyRSM++: Hybrid Relation Guided Temporal Set Matching for Few-shot Action Recognition

1 code implementation • 9 Jan 2023 • Xiang Wang, Shiwei Zhang, Zhiwu Qing, Zhengrong Zuo, Changxin Gao, Rong Jin, Nong Sang

To be specific, HyRSM++ consists of two key components, a hybrid relation module and a temporal set matching metric.

Few-Shot action recognition Few Shot Action Recognition +3

Paper
Code

Space-time Prompting for Video Class-incremental Learning

no code implementations • ICCV 2023 • Yixuan Pei, Zhiwu Qing, Shiwei Zhang, Xiang Wang, Yingya Zhang, Deli Zhao, Xueming Qian

In this paper, we will fill this gap by learning multiple prompts based on a powerful image-language pre-trained model, i. e., CLIP, making it fit for video class-incremental learning (VCIL).

Class Incremental Learning Incremental Learning

Paper
Add Code

Learning a Condensed Frame for Memory-Efficient Video Class-Incremental Learning

no code implementations • 2 Nov 2022 • Yixuan Pei, Zhiwu Qing, Jun Cen, Xiang Wang, Shiwei Zhang, Yaxiong Wang, Mingqian Tang, Nong Sang, Xueming Qian

The former is to reduce the memory cost by preserving only one condensed frame instead of the whole video, while the latter aims to compensate the lost spatio-temporal details in the Frame Condensing stage.

Action Recognition Class Incremental Learning +1

Paper
Add Code

MAR: Masked Autoencoders for Efficient Action Recognition

1 code implementation • 24 Jul 2022 • Zhiwu Qing, Shiwei Zhang, Ziyuan Huang, Xiang Wang, Yuehuan Wang, Yiliang Lv, Changxin Gao, Nong Sang

Inspired by this, we propose propose Masked Action Recognition (MAR), which reduces the redundant computation by discarding a proportion of patches and operating only on a part of the videos.

Ranked #12 on Action Recognition on Something-Something V2

Action Classification Action Recognition +1

Paper
Code

Hybrid Relation Guided Set Matching for Few-shot Action Recognition

1 code implementation • CVPR 2022 • Xiang Wang, Shiwei Zhang, Zhiwu Qing, Mingqian Tang, Zhengrong Zuo, Changxin Gao, Rong Jin, Nong Sang

To overcome the two limitations, we propose a novel Hybrid Relation guided Set Matching (HyRSM) approach that incorporates two key components: hybrid relation module and set matching metric.

Ranked #1 on Few Shot Action Recognition on Something-Something-100

Few Shot Action Recognition Relation +1

Paper
Code

Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency

no code implementations • CVPR 2022 • Zhiwu Qing, Shiwei Zhang, Ziyuan Huang, Yi Xu, Xiang Wang, Mingqian Tang, Changxin Gao, Rong Jin, Nong Sang

In this work, we aim to learn representations by leveraging more abundant information in untrimmed videos.

Contrastive Learning Representation Learning +1

Paper
Add Code

TAda! Temporally-Adaptive Convolutions for Video Understanding

2 code implementations • ICLR 2022 • Ziyuan Huang, Shiwei Zhang, Liang Pan, Zhiwu Qing, Mingqian Tang, Ziwei Liu, Marcelo H. Ang Jr

This work presents Temporally-Adaptive Convolutions (TAdaConv) for video understanding, which shows that adaptive weight calibration along the temporal dimension is an efficient way to facilitate modelling complex temporal dynamics in videos.

Ranked #67 on Action Recognition on Something-Something V2 (using extra training data)

Action Classification Action Recognition +2

216

Paper
Code

ParamCrop: Parametric Cubic Cropping for Video Contrastive Learning

1 code implementation • 24 Aug 2021 • Zhiwu Qing, Ziyuan Huang, Shiwei Zhang, Mingqian Tang, Changxin Gao, Marcelo H. Ang Jr, Rong Jin, Nong Sang

The visualizations show that ParamCrop adaptively controls the center distance and the IoU between two augmented views, and the learned change in the disparity along the training process is beneficial to learning a strong representation.

Contrastive Learning

216

Paper
Code

Exploring Stronger Feature for Temporal Action Localization

no code implementations • 24 Jun 2021 • Zhiwu Qing, Xiang Wang, Ziyuan Huang, Yutong Feng, Shiwei Zhang, Jianwen Jiang, Mingqian Tang, Changxin Gao, Nong Sang

Temporal action localization aims to localize starting and ending time with action category.

Temporal Action Localization

Paper
Add Code

OadTR: Online Action Detection with Transformers

1 code implementation • ICCV 2021 • Xiang Wang, Shiwei Zhang, Zhiwu Qing, Yuanjie Shao, Zhengrong Zuo, Changxin Gao, Nong Sang

Most recent approaches for online action detection tend to apply Recurrent Neural Network (RNN) to capture long-range temporal structure.

Ranked #8 on Online Action Detection on THUMOS'14

Decoder Online Action Detection

Paper
Code

Weakly-Supervised Temporal Action Localization Through Local-Global Background Modeling

no code implementations • 20 Jun 2021 • Xiang Wang, Zhiwu Qing, Ziyuan Huang, Yutong Feng, Shiwei Zhang, Jianwen Jiang, Mingqian Tang, Yuanjie Shao, Nong Sang

Then our proposed Local-Global Background Modeling Network (LGBM-Net) is trained to localize instances by using only video-level labels based on Multi-Instance Learning (MIL).

Weakly-supervised Learning Weakly-supervised Temporal Action Localization +1

Paper
Add Code

Proposal Relation Network for Temporal Action Detection

1 code implementation • 20 Jun 2021 • Xiang Wang, Zhiwu Qing, Ziyuan Huang, Yutong Feng, Shiwei Zhang, Jianwen Jiang, Mingqian Tang, Changxin Gao, Nong Sang

We calculate the detection results by assigning the proposals with corresponding classification results.

Ranked #2 on Temporal Action Localization on ActivityNet-1.3 (using extra training data)

Action Classification Action Detection +3

Paper
Code

Relation Modeling in Spatio-Temporal Action Localization

no code implementations • 15 Jun 2021 • Yutong Feng, Jianwen Jiang, Ziyuan Huang, Zhiwu Qing, Xiang Wang, Shiwei Zhang, Mingqian Tang, Yue Gao

This paper presents our solution to the AVA-Kinetics Crossover Challenge of ActivityNet workshop at CVPR 2021.

Ranked #4 on Spatio-Temporal Action Localization on AVA-Kinetics (using extra training data)

Action Detection Relation +2

Paper
Add Code

A Stronger Baseline for Ego-Centric Action Detection

1 code implementation • 13 Jun 2021 • Zhiwu Qing, Ziyuan Huang, Xiang Wang, Yutong Feng, Shiwei Zhang, Jianwen Jiang, Mingqian Tang, Changxin Gao, Marcelo H. Ang Jr, Nong Sang

This technical report analyzes an egocentric video action detection method we used in the 2021 EPIC-KITCHENS-100 competition hosted in CVPR2021 Workshop.

Action Detection

216

Paper
Code

Towards Training Stronger Video Vision Transformers for EPIC-KITCHENS-100 Action Recognition

1 code implementation • 9 Jun 2021 • Ziyuan Huang, Zhiwu Qing, Xiang Wang, Yutong Feng, Shiwei Zhang, Jianwen Jiang, Zhurong Xia, Mingqian Tang, Nong Sang, Marcelo H. Ang Jr

In this paper, we present empirical results for training a stronger video vision transformer on the EPIC-KITCHENS-100 Action Recognition dataset.

Action Recognition Point Cloud Classification +1

216

Paper
Code

Self-Supervised Learning for Semi-Supervised Temporal Action Proposal

1 code implementation • CVPR 2021 • Xiang Wang, Shiwei Zhang, Zhiwu Qing, Yuanjie Shao, Changxin Gao, Nong Sang

In this paper, we focus on applying the power of self-supervised methods to improve semi-supervised action proposal generation.

Ranked #2 on Semi-Supervised Action Detection on ActivityNet-1.3

Relation Self-Supervised Learning +2

Paper
Code

Temporal Context Aggregation Network for Temporal Action Proposal Refinement

1 code implementation • CVPR 2021 • Zhiwu Qing, Haisheng Su, Weihao Gan, Dongliang Wang, Wei Wu, Xiang Wang, Yu Qiao, Junjie Yan, Changxin Gao, Nong Sang

In this paper, we propose Temporal Context Aggregation Network (TCANet) to generate high-quality action proposals through "local and global" temporal context aggregation and complementary as well as progressive boundary refinement.

Ranked #9 on Temporal Action Localization on ActivityNet-1.3

Action Detection Retrieval +2

Paper
Code

Temporal Fusion Network for Temporal Action Localization:Submission to ActivityNet Challenge 2020 (Task E)

no code implementations • 13 Jun 2020 • Zhiwu Qing, Xiang Wang, Yongpeng Sang, Changxin Gao, Shiwei Zhang, Nong Sang

This technical report analyzes a temporal action localization method we used in the HACS competition which is hosted in Activitynet Challenge 2020. The goal of our task is to locate the start time and end time of the action in the untrimmed video, and predict action category. Firstly, we utilize the video-level feature information to train multiple video-level action classification models.

Action Classification Temporal Action Localization

Paper
Add Code

CBR-Net: Cascade Boundary Refinement Network for Action Detection: Submission to ActivityNet Challenge 2020 (Task 1)

1 code implementation • 13 Jun 2020 • Xiang Wang, Baiteng Ma, Zhiwu Qing, Yongpeng Sang, Changxin Gao, Shiwei Zhang, Nong Sang

In this report, we present our solution for the task of temporal action localization (detection) (task 1) in ActivityNet Challenge 2020.

Action Detection Temporal Action Localization

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.