no code implementations • 26 Mar 2024 • Qingping Sun, Yanjun Wang, Ailing Zeng, Wanqi Yin, Chen Wei, Wenjia Wang, Haiyi Mei, Chi Sing Leung, Ziwei Liu, Lei Yang, Zhongang Cai
Expressive human pose and shape estimation (a. k. a.
1 code implementation • 25 Jan 2024 • Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao, Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, Zhaoyang Zeng, Hao Zhang, Feng Li, Jie Yang, Hongyang Li, Qing Jiang, Lei Zhang
We introduce Grounded SAM, which uses Grounding DINO as an open-set object detector to combine with the segment anything model (SAM).
1 code implementation • 18 Jan 2024 • Xuangeng Chu, Yu Li, Ailing Zeng, Tianyu Yang, Lijian Lin, Yunfei Liu, Tatsuya Harada
Head avatar reconstruction, crucial for applications in virtual reality, online meetings, gaming, and film industries, has garnered substantial attention within the computer vision community.
no code implementations • 9 Jan 2024 • Junming Chen, Yunfei Liu, Jianan Wang, Ailing Zeng, Yu Li, Qifeng Chen
We propose DiffSHEG, a Diffusion-based approach for Speech-driven Holistic 3D Expression and Gesture generation with arbitrary length.
1 code implementation • 9 Dec 2023 • Junzhe Lu, Jing Lin, Hongkun Dou, Ailing Zeng, Yue Deng, Yulun Zhang, Haoqian Wang
Our approach demonstrates considerable enhancements over common uniform scheduling used in image domains, boasting improvements of 5. 4%, 17. 2%, and 3. 8% across human mesh recovery, pose completion, and motion denoising, respectively.
no code implementations • 7 Dec 2023 • Yinhuai Wang, Jing Lin, Ailing Zeng, Zhengyi Luo, Jian Zhang, Lei Zhang
To make up for the lack of dynamic HOI scenarios in this area, we introduce the BallPlay dataset that contains eight whole-body basketball skills.
no code implementations • 19 Oct 2023 • Shunlin Lu, Ling-Hao Chen, Ailing Zeng, Jing Lin, Ruimao Zhang, Lei Zhang, Heung-Yeung Shum
This work targets a novel text-driven whole-body motion generation task, which takes a given textual description as input and aims at generating high-quality, diverse, and coherent facial expressions, hand gestures, and body motions simultaneously.
1 code implementation • 12 Oct 2023 • Jie Yang, Ailing Zeng, Ruimao Zhang, Lei Zhang
This work proposes a unified framework called UniPose to detect keypoints of any articulated (e. g., human and animal), rigid, and soft objects via visual or textual prompts for fine-grained vision understanding and manipulation.
Ranked #1 on 2D Human Pose Estimation on Human-Art (using extra training data)
no code implementations • 6 Oct 2023 • Xinpeng Liu, Yong-Lu Li, Ailing Zeng, Zizheng Zhou, Yang You, Cewu Lu
The goal of motion understanding is to establish a reliable mapping between motion and action semantics, while it is a challenging many-to-many problem.
1 code implementation • 2 Oct 2023 • Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, Qiang Xu
Specifically, in the context of diffusion-based editing, where a source image is edited according to a target prompt, the process commences by acquiring a noisy latent vector corresponding to the source image via the diffusion model.
Ranked #3 on Text-based Image Editing on PIE-Bench
1 code implementation • 10 Sep 2023 • Jiong Wang, Fengyu Yang, Wenbo Gou, Bingliang Li, Danqi Yan, Ailing Zeng, Yijun Gao, Junle Wang, Yanqing Jing, Ruimao Zhang
To facilitate the development of 3D pose estimation, we present FreeMan, the first large-scale, multi-view dataset collected under the real-world conditions.
1 code implementation • ICCV 2023 • Jie Yang, Ailing Zeng, Feng Li, Shilong Liu, Ruimao Zhang, Lei Zhang
Click-Pose explores how user feedback can cooperate with a neural keypoint detector to correct the predicted keypoints in an interactive way for a faster and more effective annotation process.
1 code implementation • 29 Jul 2023 • Zhendong Yang, Ailing Zeng, Chun Yuan, Yu Li
Different from the previous self-knowledge distillation, this stage finetunes the student's head with only 20% training time as a plug-and-play training strategy.
Ranked #1 on 2D Human Pose Estimation on COCO-WholeBody (using extra training data)
1 code implementation • 6 Jul 2023 • Zhijian Xu, Ailing Zeng, Qiang Xu
In this paper, we introduce FITS, a lightweight yet powerful model for time series analysis.
1 code implementation • NeurIPS 2023 • Jing Lin, Ailing Zeng, Shunlin Lu, Yuanhao Cai, Ruimao Zhang, Haoqian Wang, Lei Zhang
In this paper, we present Motion-X, a large-scale 3D expressive whole-body motion dataset.
1 code implementation • 12 Jun 2023 • Tianhe Ren, Shilong Liu, Feng Li, Hao Zhang, Ailing Zeng, Jie Yang, Xingyu Liao, Ding Jia, Hongyang Li, He Cao, Jianan Wang, Zhaoyang Zeng, Xianbiao Qi, Yuhui Yuan, Jianwei Yang, Lei Zhang
To address this issue, we develop a unified, highly modular, and lightweight codebase called detrex, which supports a majority of the mainstream DETR-based instance recognition algorithms, covering various fundamental tasks, including object detection, segmentation, and pose estimation.
1 code implementation • 20 May 2023 • Jie Yang, Bingliang Li, Fengyu Yang, Ailing Zeng, Lei Zhang, Ruimao Zhang
Extensive experiments demonstrate that DiffHOI significantly outperforms the state-of-the-art in regular detection (i. e., 41. 50 mAP) and zero-shot detection.
Ranked #2 on Zero-Shot Human-Object Interaction Detection on HICO-DET (using extra training data)
Human-Object Interaction Detection Zero-Shot Human-Object Interaction Detection
3 code implementations • 25 Apr 2023 • Tianhe Ren, Jianwei Yang, Shilong Liu, Ailing Zeng, Feng Li, Hao Zhang, Hongyang Li, Zhaoyang Zeng, Lei Zhang
This work presents Focal-Stable-DINO, a strong and reproducible object detection model which achieves 64. 6 AP on COCO val2017 and 64. 8 AP on COCO test-dev using only 700M parameters without any test time augmentation.
Ranked #5 on Object Detection on COCO minival (using extra training data)
1 code implementation • ICCV 2023 • Xuan Ju, Ailing Zeng, Chenchen Zhao, Jianan Wang, Lei Zhang, Qiang Xu
While such a plug-and-play approach is appealing, the inevitable and uncertain conflicts between the original images produced from the frozen SD branch and the given condition incur significant challenges for the learnable branch, which essentially conducts image feature editing for condition enforcement.
1 code implementation • CVPR 2023 • Jing Lin, Ailing Zeng, Haoqian Wang, Lei Zhang, Yu Li
It is challenging to perform this task with a single network due to resolution issues, i. e., the face and hands are usually located in extremely small regions.
Ranked #3 on 3D Human Pose Estimation on UBody
1 code implementation • ICCV 2023 • Zhendong Yang, Ailing Zeng, Zhe Li, Tianke Zhang, Chun Yuan, Yu Li
We decompose the KD loss and find the non-target loss from it forces the student's non-target logits to match the teacher's, but the sum of the two non-target logits is different, preventing them from being identical.
1 code implementation • 13 Mar 2023 • Feng Li, Ailing Zeng, Shilong Liu, Hao Zhang, Hongyang Li, Lei Zhang, Lionel M. Ni
Recent DEtection TRansformer-based (DETR) models have obtained remarkable performance.
1 code implementation • CVPR 2023 • Xuan Ju, Ailing Zeng, Jianan Wang, Qiang Xu, Lei Zhang
Humans have long been recorded in a variety of forms since antiquity.
no code implementations • 25 Feb 2023 • Hao Zhang, Hongyang Li, Ailing Zeng, Feng Li, Shilong Liu, Xingyu Liao, Lei Zhang
To address the second issue, we introduce an auxiliary learning task called Depth-aware Negative Suppression loss.
no code implementations • 18 Feb 2023 • Muxi Chen, Zhijian Xu, Ailing Zeng, Qiang Xu
In time series forecasting (TSF), we need to model the fine-grained temporal relationship within time series segments to generate accurate forecasting results given data in a look-back window.
3 code implementations • 3 Feb 2023 • Jie Yang, Ailing Zeng, Shilong Liu, Feng Li, Ruimao Zhang, Lei Zhang
This paper presents a novel end-to-end framework with Explicit box Detection for multi-person Pose estimation, called ED-Pose, where it unifies the contextual learning between human-level (global) and keypoint-level (local) information.
Ranked #2 on 2D Human Pose Estimation on Human-Art
no code implementations • CVPR 2023 • Feng Li, Ailing Zeng, Shilong Liu, Hao Zhang, Hongyang Li, Lei Zhang, Lionel M. Ni
Recent DEtection TRansformer-based (DETR) models have obtained remarkable performance.
1 code implementation • 6 Sep 2022 • Zhendong Yang, Zhe Li, Ailing Zeng, Zexian Li, Chun Yuan, Yu Li
In this paper, we explore the way of feature-based distillation for ViT.
4 code implementations • 26 May 2022 • Ailing Zeng, Muxi Chen, Lei Zhang, Qiang Xu
Recently, there has been a surge of Transformer-based solutions for the long-term time series forecasting (LTSF) task.
Ranked #1 on Time Series Forecasting on ETTh1 (96) Univariate
no code implementations • 28 Apr 2022 • Zhongang Cai, Daxuan Ren, Ailing Zeng, Zhengyu Lin, Tao Yu, Wenjia Wang, Xiangyu Fan, Yang Gao, Yifan Yu, Liang Pan, Fangzhou Hong, Mingyuan Zhang, Chen Change Loy, Lei Yang, Ziwei Liu
4D human sensing and modeling are fundamental tasks in vision and graphics with numerous applications.
1 code implementation • 16 Mar 2022 • Ailing Zeng, Xuan Ju, Lei Yang, Ruiyuan Gao, Xizhou Zhu, Bo Dai, Qiang Xu
This paper proposes a simple baseline framework for video-based 2D/3D human pose estimation that can achieve 10 times efficiency improvement over existing works without any performance degradation, named DeciWatch.
Ranked #1 on 2D Human Pose Estimation on JHMDB (2D poses only)
2 code implementations • 27 Dec 2021 • Ailing Zeng, Lei Yang, Xuan Ju, Jiefeng Li, Jianyi Wang, Qiang Xu
With a simple yet effective motion-aware fully-connected network, SmoothNet improves the temporal smoothness of existing pose estimators significantly and enhances the estimation accuracy of those challenging frames as a side-effect.
no code implementations • ICLR 2022 • Minhao Liu, Ailing Zeng, Qiuxia Lai, Ruiyuan Gao, Min Li, Jing Qin, Qiang Xu
In this work, we propose a novel tree-structured wavelet neural network for time series signal analysis, namely T-WaveNet, by taking advantage of an inherent property of various types of signals, known as the dominant frequency range.
no code implementations • ICCV 2021 • Ailing Zeng, Xiao Sun, Lei Yang, Nanxuan Zhao, Minhao Liu, Qiang Xu
While the average prediction accuracy has been improved significantly over the years, the performance on hard poses with depth ambiguity, self-occlusion, and complex or rare poses is still far from satisfactory.
Ranked #23 on Skeleton Based Action Recognition on NTU RGB+D 120
1 code implementation • 7 Aug 2021 • Qiuxia Lai, Yu Li, Ailing Zeng, Minhao Liu, Hanqiu Sun, Qiang Xu
Extensive experiments show that the proposed IB-inspired spatial attention mechanism can yield attention maps that neatly highlight the regions of interest while suppressing backgrounds, and bootstrap standard DNN structures for visual recognition tasks (e. g., image classification, fine-grained recognition, cross-domain classification).
3 code implementations • ICCV 2021 • Jiefeng Li, Siyuan Bian, Ailing Zeng, Can Wang, Bo Pang, Wentao Liu, Cewu Lu
In light of this, we propose a novel regression paradigm with Residual Log-likelihood Estimation (RLE) to capture the underlying output distribution.
Ranked #59 on 3D Human Pose Estimation on Human3.6M
4 code implementations • 17 Jun 2021 • Minhao Liu, Ailing Zeng, Muxi Chen, Zhijian Xu, Qiuxia Lai, Lingna Ma, Qiang Xu
One unique property of time series is that the temporal relations are largely preserved after downsampling into two sub-sequences.
Ranked #1 on Time Series Forecasting on ETTh1 (24) Multivariate (using extra training data)
no code implementations • 30 May 2021 • Ailing Zeng, Minhao Liu, Zhiwei Liu, Ruiyuan Gao, Jing Qin, Qiang Xu
We propose a novel solution to addressing a long-standing dilemma in the representation learning of graph neural networks (GNNs): how to effectively capture and represent useful information embedded in long-distance nodes to improve the performance of nodes with low homophily without leading to performance degradation in nodes with high homophily.
no code implementations • 21 Apr 2021 • Yunyan Hong, Ailing Zeng, Min Li, Cewu Lu, Li Jiang, Qiang Xu
Video action recognition (VAR) is a primary task of video understanding, and untrimmed videos are more common in real-life scenes.
no code implementations • 10 Dec 2020 • Minhao Liu, Ailing Zeng, Qiuxia Lai, Qiang Xu
Motivated by the fact that usually a small subset of the frequency components carries the primary information for sensor data, we propose a novel tree-structured wavelet neural network for sensor data analysis, namely \emph{T-WaveNet}.
1 code implementation • ECCV 2020 • Ailing Zeng, Xiao Sun, Fuyang Huang, Minhao Liu, Qiang Xu, Stephen Lin
With the reduced dimensionality of less relevant body areas, the training set distribution within network branches more closely reflects the statistics of local poses instead of global body poses, without sacrificing information important for joint inference.
Ranked #20 on Monocular 3D Human Pose Estimation on Human3.6M
no code implementations • 9 Dec 2019 • Fuyang Huang, Ailing Zeng, Minhao Liu, Qiuxia Lai, Qiang Xu
In this paper, we propose a two-stage fully 3D network, namely \textbf{DeepFuse}, to estimate human pose in 3D space by fusing body-worn Inertial Measurement Unit (IMU) data and multi-view images deeply.
Ranked #5 on 3D Human Pose Estimation on Total Capture
no code implementations • 26 Dec 2018 • Fuyang Huang, Ailing Zeng, Minhao Liu, Jing Qin, Qiang Xu
Experimental results show that the proposed structure-aware 3D hourglass network is able to achieve a mean joint error of 7. 4 mm in MSRA and 8. 9 mm in NYU datasets, respectively.