no code implementations • 20 Apr 2024 • Yangcen Liu, Ziyi Liu, Yuanhao Zhai, Wen Li, David Doerman, Junsong Yuan
To address this problem, we propose the Generalizable Temporal Action Localization task (GTAL), which focuses on improving the generalizability of action localization methods.
Weakly-supervised Temporal Action Localization Weakly Supervised Temporal Action Localization
1 code implementation • ICCV 2023 • Yuanhao Zhai, Ziyi Liu, Zhenyu Wu, Yi Wu, Chunluan Zhou, David Doermann, Junsong Yuan, Gang Hua
The former prevents the decoder from reconstructing the video background given video features, and thus helps reduce the background information in feature learning.
1 code implementation • 18 Aug 2023 • Yuanhao Zhai, Mingzhen Huang, Tianyu Luan, Lu Dong, Ifeoma Nwogu, Siwei Lyu, David Doermann, Junsong Yuan
In this paper, we propose ATOM (ATomic mOtion Modeling) to mitigate this problem, by decomposing actions into atomic actions, and employing a curriculum learning strategy to learn atomic action composition.
1 code implementation • CVPR 2023 • Tianyu Luan, Yuanhao Zhai, Jingjing Meng, Zhong Li, Zhang Chen, Yi Xu, Junsong Yuan
To capture high-frequency personalized details, we transform the 3D mesh into the frequency domain, and propose a novel frequency decomposition loss to supervise each frequency component.
1 code implementation • 30 Jun 2023 • Tan Wang, Linjie Li, Kevin Lin, Yuanhao Zhai, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang
In this paper, we depart from the traditional paradigm of human motion transfer and emphasize two additional critical attributes for the synthesis of human dance content in social media contexts: (i) Generalizability: the model should be able to generalize beyond generic human viewpoints as well as unseen human subjects, backgrounds, and poses; (ii) Compositionality: it should allow for the seamless composition of seen/unseen subjects, backgrounds, and poses from different sources.
no code implementations • ICCV 2023 • Yuanhao Zhai, Tianyu Luan, David Doermann, Junsong Yuan
To improve the generalization ability, we propose weakly-supervised self-consistency learning (WSCL) to leverage the weakly annotated images.
no code implementations • 21 Jun 2021 • Yuanhao Zhai, Le Wang, David Doermann, Junsong Yuan
The base model training encourages the model to predict reliable predictions based on single modality (i. e., RGB or optical flow), based on the fusion of which a pseudo ground truth is generated and in turn used as supervision to train the base models.
no code implementations • ECCV 2020 • Yuanhao Zhai, Le Wang, Wei Tang, Qilin Zhang, Junsong Yuan, Gang Hua
Weakly-supervised Temporal Action Localization (W-TAL) aims to classify and localize all action instances in an untrimmed video under only video-level supervision.
Ranked #12 on Weakly Supervised Action Localization on THUMOS14
Vocal Bursts Valence Prediction Weakly Supervised Action Localization +2