Search Results for author: Wentao Zhu

Found 63 papers, 24 papers with code

Efficient Action Counting with Dynamic Queries

1 code implementation3 Mar 2024 Zishi Li, Xiaoxuan Ma, Qiuyan Shang, Wentao Zhu, Hai Ci, Yu Qiao, Yizhou Wang

Temporal repetition counting aims to quantify the repeated action cycles within a video.

Contrastive Learning

OpenMEDLab: An Open-source Platform for Multi-modality Foundation Models in Medicine

no code implementations28 Feb 2024 Xiaosong Wang, Xiaofan Zhang, Guotai Wang, Junjun He, Zhongyu Li, Wentao Zhu, Yi Guo, Qi Dou, Xiaoxiao Li, Dequan Wang, Liang Hong, Qicheng Lao, Tong Ruan, Yukun Zhou, Yixue Li, Jie Zhao, Kang Li, Xin Sun, Lifeng Zhu, Shaoting Zhang

The emerging trend of advancing generalist artificial intelligence, such as GPTv4 and Gemini, has reshaped the landscape of research (academia and industry) in machine learning and many other research areas.

Transfer Learning

Language Models Represent Beliefs of Self and Others

no code implementations28 Feb 2024 Wentao Zhu, Zhining Zhang, Yizhou Wang

Understanding and attributing mental states, known as Theory of Mind (ToM), emerges as a fundamental capability for human social reasoning.

Causal Inference

Real-time Holistic Robot Pose Estimation with Unknown States

1 code implementation8 Feb 2024 Shikun Ban, Juling Fan, Wentao Zhu, Xiaoxuan Ma, Yu Qiao, Yizhou Wang

We propose an end-to-end pipeline for real-time, holistic robot pose estimation from a single RGB image, even in the absence of known robot states.

6D Pose Estimation using RGB Robot Pose Estimation

Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification

no code implementations8 Jan 2024 Wentao Zhu

To learn from multimodal videos effectively, in this work, we propose a novel audio-video recognition approach termed audio video Transformer, AVT, leveraging the effective spatio-temporal representation by the video Transformer to improve action recognition accuracy.

Action Recognition Contrastive Learning +2

Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video Classification

no code implementations8 Jan 2024 Wentao Zhu

In recent years, researchers combine both audio and video signals to deal with challenges where actions are not well represented or captured by visual cues.

Representation Learning Video Classification

TPC-ViT: Token Propagation Controller for Efficient Vision Transformer

no code implementations3 Jan 2024 Wentao Zhu

Previous approaches that employ gradual token reduction to address this challenge assume that token redundancy in one layer implies redundancy in all the following layers.

Token Reduction

Deformable Audio Transformer for Audio Event Detection

no code implementations24 Dec 2023 Wentao Zhu

Hence, we introduce a learnable input adaptor to alleviate this issue, and DATAR achieves state-of-the-art performance.

Event Detection

ChimpACT: A Longitudinal Dataset for Understanding Chimpanzee Behaviors

1 code implementation NeurIPS 2023 Xiaoxuan Ma, Stephan P. Kaufhold, Jiajun Su, Wentao Zhu, Jack Terwilliger, Andres Meza, Yixin Zhu, Federico Rossano, Yizhou Wang

ChimpACT is both comprehensive and challenging, consisting of 163 videos with a cumulative 160, 500 frames, each richly annotated with detection, identification, pose estimation, and fine-grained spatiotemporal behavior labels.

Action Detection Pose Estimation

A Multi-Scale Spatial Transformer U-Net for Simultaneously Automatic Reorientation and Segmentation of 3D Nuclear Cardiac Images

no code implementations16 Oct 2023 Yangfan Ni, Duo Zhang, Gege Ma, Lijun Lu, Zhongke Huang, Wentao Zhu

Accurate reorientation and segmentation of the left ventricular (LV) is essential for the quantitative analysis of myocardial perfusion imaging (MPI), in which one critical step is to reorient the reconstructed transaxial nuclear cardiac images into standard short-axis slices for subsequent image processing.

LV Segmentation Segmentation

UPL-SFDA: Uncertainty-aware Pseudo Label Guided Source-Free Domain Adaptation for Medical Image Segmentation

1 code implementation19 Sep 2023 Jianghao Wu, Guotai Wang, Ran Gu, Tao Lu, Yinan Chen, Wentao Zhu, Tom Vercauteren, Sébastien Ourselin, Shaoting Zhang

The different predictions in these duplicated heads are used to obtain pseudo labels for unlabeled target-domain images and their uncertainty to identify reliable pseudo labels.

Brain Segmentation Image Segmentation +5

BROW: Better featuRes fOr Whole slide image based on self-distillation

no code implementations15 Sep 2023 Yuanfeng Wu, Shaojie Li, Zhiqiang Du, Wentao Zhu

Hence, we proposed BROW, a foundation model for extracting better feature representations for WSIs, which can be conveniently adapted to downstream tasks without or with slight fine-tuning.

Instance Segmentation Semantic Segmentation

Classification of lung cancer subtypes on CT images with synthetic pathological priors

no code implementations9 Aug 2023 Wentao Zhu, Yuan Jin, Gege Ma, Geng Chen, Jan Egger, Shaoting Zhang, Dimitris N. Metaxas

The accurate diagnosis on pathological subtypes for lung cancer is of significant importance for the follow-up treatments and prognosis managements.

Computed Tomography (CT)

Human Motion Generation: A Survey

no code implementations20 Jul 2023 Wentao Zhu, Xiaoxuan Ma, Dongwoo Ro, Hai Ci, Jinlu Zhang, Jiaxin Shi, Feng Gao, Qi Tian, Yizhou Wang

In this survey, we present a comprehensive literature review of human motion generation, which, to the best of our knowledge, is the first of its kind in this field.

Selective Structured State-Spaces for Long-Form Video Understanding

no code implementations CVPR 2023 Jue Wang, Wentao Zhu, Pichao Wang, Xiang Yu, Linda Liu, Mohamed Omar, Raffay Hamid

To address this limitation, we present a novel Selective S4 (i. e., S5) model that employs a lightweight mask generator to adaptively select informative image tokens resulting in more efficient and accurate modeling of long-term spatiotemporal dependencies in videos.

Contrastive Learning Token Reduction +2

3D Human Mesh Estimation from Virtual Markers

1 code implementation CVPR 2023 Xiaoxuan Ma, Jiajun Su, Chunyu Wang, Wentao Zhu, Yizhou Wang

The advanced motion capture systems solve the problem by placing dense physical markers on the body surface, which allows to extract realistic meshes from their non-rigid motions.

3D Human Pose Estimation 3D Pose Estimation

Multiscale Audio Spectrogram Transformer for Efficient Audio Classification

no code implementations19 Mar 2023 Wentao Zhu, Mohamed Omar

Audio event has a hierarchical architecture in both time and frequency and can be grouped together to construct more abstract semantic audio classes.

Audio Classification Representation Learning

Dynamic Inference With Grounding Based Vision and Language Models

no code implementations CVPR 2023 Burak Uzkent, Amanmeet Garg, Wentao Zhu, Keval Doshi, Jingru Yi, Xiaolong Wang, Mohamed Omar

For example, recent image and language models with more than 200M parameters have been proposed to learn visual grounding in the pre-training step and show impressive results on downstream vision and language tasks.

Language Modelling Referring Expression +3

GFPose: Learning 3D Human Pose Prior with Gradient Fields

1 code implementation CVPR 2023 Hai Ci, Mingdong Wu, Wentao Zhu, Xiaoxuan Ma, Hao Dong, Fangwei Zhong, Yizhou Wang

During the denoising process, GFPose implicitly incorporates pose priors in gradients and unifies various discriminative and generative tasks in an elegant framework.

Denoising Monocular 3D Human Pose Estimation +1

Intelligent Computing: The Latest Advances, Challenges and Future

no code implementations21 Nov 2022 Shiqiang Zhu, Ting Yu, Tao Xu, Hongyang Chen, Schahram Dustdar, Sylvain Gigan, Deniz Gunduz, Ekram Hossain, Yaochu Jin, Feng Lin, Bo Liu, Zhiguo Wan, Ji Zhang, Zhifeng Zhao, Wentao Zhu, Zuoning Chen, Tariq Durrani, Huaimin Wang, Jiangxing Wu, Tongyi Zhang, Yunhe Pan

In recent years, we have witnessed the emergence of intelligent computing, a new computing paradigm that is reshaping traditional computing and promoting digital revolution in the era of big data, artificial intelligence and internet-of-things with new computing theories, architectures, methods, systems, and applications.

MotionBERT: A Unified Perspective on Learning Human Motion Representations

1 code implementation ICCV 2023 Wentao Zhu, Xiaoxuan Ma, Zhaoyang Liu, Libin Liu, Wayne Wu, Yizhou Wang

We present a unified perspective on tackling various human-centric video tasks by learning human motion representations from large-scale and heterogeneous data resources.

 Ranked #1 on Monocular 3D Human Pose Estimation on Human3.6M (using extra training data)

3D Pose Estimation Action Recognition +3

AVT: Audio-Video Transformer for Multimodal Action Recognition

no code implementations Submitted to ICLR 2022 Wentao Zhu, Jingru Yi, Kevin Hsu, Xiaohang Sun, Xiang Hao, Linda Liu, Mohamed Omar

AVT uses a combination of video and audio signals to improve action recognition accuracy, leveraging the effective spatio-temporal representation by the video Transformer.

Action Recognition Audio Classification +3

Anti-Retroactive Interference for Lifelong Learning

1 code implementation27 Aug 2022 Runqi Wang, Yuxiang Bao, Baochang Zhang, Jianzhuang Liu, Wentao Zhu, Guodong Guo

Second, according to the similarity between incremental knowledge and base knowledge, we design an adaptive fusion of incremental knowledge, which helps the model allocate capacity to the knowledge of different difficulties.

Meta-Learning

CelebV-HQ: A Large-Scale Video Facial Attributes Dataset

1 code implementation25 Jul 2022 Hao Zhu, Wayne Wu, Wentao Zhu, Liming Jiang, Siwei Tang, Li Zhang, Ziwei Liu, Chen Change Loy

Large-scale datasets have played indispensable roles in the recent success of face generation/editing and significantly facilitated the advances of emerging research fields.

Attribute Face Generation +1

Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection

1 code implementation22 Jul 2022 Hang Ye, Wentao Zhu, Chunyu Wang, Rujie Wu, Yizhou Wang

While the voxel-based methods have achieved promising results for multi-person 3D pose estimation from multi-cameras, they suffer from heavy computation burdens, especially for large scenes.

3D Multi-Person Pose Estimation 3D Pose Estimation

Adversarial Contrastive Self-Supervised Learning

no code implementations26 Feb 2022 Wentao Zhu, Hang Shang, Tingxun Lv, Chao Liao, Sen yang, Ji Liu

Recently, learning from vast unlabeled data, especially self-supervised learning, has been emerging and attracted widespread attention.

Self-Supervised Learning

Associative Adversarial Learning Based on Selective Attack

no code implementations28 Dec 2021 Runqi Wang, Xiaoyue Duan, Baochang Zhang, Song Xue, Wentao Zhu, David Doermann, Guodong Guo

We show that our method improves the recognition accuracy of adversarial training on ImageNet by 8. 32% compared with the baseline.

Adversarial Robustness Few-Shot Learning +2

MoCaNet: Motion Retargeting in-the-wild via Canonicalization Networks

no code implementations19 Dec 2021 Wentao Zhu, Zhuoqian Yang, Ziang Di, Wayne Wu, Yizhou Wang, Chen Change Loy

Trained with the canonicalization operations and the derived regularizations, our method learns to factorize a skeleton sequence into three independent semantic subspaces, i. e., motion, structure, and view angle.

3D Reconstruction Action Analysis +2

Self-Supervised Monocular Depth and Ego-Motion Estimation in Endoscopy: Appearance Flow to the Rescue

1 code implementation15 Dec 2021 Shuwei Shao, Zhongcai Pei, Weihai Chen, Wentao Zhu, Xingming Wu, Dianmin Sun, Baochang Zhang

Recently, self-supervised learning technology has been applied to calculate depth and ego-motion from monocular videos, achieving remarkable performance in autonomous driving scenarios.

Depth Estimation Motion Estimation +1

Towards Comprehensive Monocular Depth Estimation: Multiple Heads Are Better Than One

no code implementations16 Nov 2021 Shuwei Shao, Ran Li, Zhongcai Pei, Zhong Liu, Weihai Chen, Wentao Zhu, Xingming Wu, Baochang Zhang

In this work, we investigate into the phenomenon and propose to integrate the strengths of multiple weak depth predictor to build a comprehensive and accurate depth predictor, which is critical for many real-world applications, e. g., 3D reconstruction.

3D Reconstruction Ensemble Learning +2

Joint Channel and Weight Pruning for Model Acceleration on Moblie Devices

1 code implementation15 Oct 2021 Tianli Zhao, Xi Sheryl Zhang, Wentao Zhu, Jiaxing Wang, Sen yang, Ji Liu, Jian Cheng

In this paper, we present a unified framework with Joint Channel pruning and Weight pruning (JCW), and achieves a better Pareto-frontier between the latency and accuracy than previous model compression approaches.

Model Compression

SpeechNAS: Towards Better Trade-off between Latency and Accuracy for Large-Scale Speaker Verification

1 code implementation18 Sep 2021 Wentao Zhu, Tianlong Kong, Shun Lu, Jixiang Li, Dawei Zhang, Feng Deng, Xiaorui Wang, Sen yang, Ji Liu

Recently, x-vector has been a successful and popular approach for speaker verification, which employs a time delay neural network (TDNN) and statistics pooling to extract speaker characterizing embedding from variable-length utterances.

Neural Architecture Search Speaker Recognition +2

Shifted Chunk Transformer for Spatio-Temporal Representational Learning

no code implementations NeurIPS 2021 Xuefan Zha, Wentao Zhu, Tingxun Lv, Sen yang, Ji Liu

However, the pure-Transformer based spatio-temporal learning can be prohibitively costly on memory and computation to extract fine-grained features from a tiny patch.

Action Anticipation Action Recognition +4

Test-Time Training for Deformable Multi-Scale Image Registration

no code implementations25 Mar 2021 Wentao Zhu, Yufang Huang, Daguang Xu, Zhen Qian, Wei Fan, Xiaohui Xie

Registration is a fundamental task in medical robotics and is often a crucial step for many downstream tasks such as motion analysis, intra-operative tracking and image segmentation.

Image Registration Image Segmentation +1

Multi-Domain Image Completion for Random Missing Input Data

no code implementations10 Jul 2020 Liyue Shen, Wentao Zhu, Xiaosong Wang, Lei Xing, John M. Pauly, Baris Turkbey, Stephanie Anne Harmon, Thomas Hogue Sanford, Sherif Mehralivand, Peter Choyke, Bradford Wood, Daguang Xu

Multi-domain data are widely leveraged in vision applications taking advantage of complementary information from different modalities, e. g., brain tumor segmentation from multi-parametric magnetic resonance imaging (MRI).

Brain Tumor Segmentation Disentanglement +3

LAMP: Large Deep Nets with Automated Model Parallelism for Image Segmentation

1 code implementation22 Jun 2020 Wentao Zhu, Can Zhao, Wenqi Li, Holger Roth, Ziyue Xu, Daguang Xu

In this work, we introduce Large deep 3D ConvNets with Automated Model Parallelism (LAMP) and investigate the impact of both input's and deep 3D ConvNets' size on segmentation accuracy.

Image Segmentation Segmentation +1

TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting

no code implementations CVPR 2020 Zhuoqian Yang, Wentao Zhu, Wayne Wu, Chen Qian, Qiang Zhou, Bolei Zhou, Chen Change Loy

We present a lightweight video motion retargeting approach TransMoMo that is capable of transferring motion of a person in a source video realistically to another video of a target person.

motion retargeting

NeurReg: Neural Registration and Its Application to Image Segmentation

1 code implementation4 Oct 2019 Wentao Zhu, Andriy Myronenko, Ziyue Xu, Wenqi Li, Holger Roth, Yufang Huang, Fausto Milletari, Daguang Xu

Furthermore, we design three segmentation frameworks based on the proposed registration framework: 1) atlas-based segmentation, 2) joint learning of both segmentation and registration tasks, and 3) multi-task learning with atlas-based segmentation as an intermediate feature.

Image Registration Image Segmentation +3

Cardiac Segmentation of LGE MRI with Noisy Labels

no code implementations2 Oct 2019 Holger Roth, Wentao Zhu, Dong Yang, Ziyue Xu, Daguang Xu

In the first step, we register a small set of five LGE cardiac magnetic resonance (CMR) images with ground truth labels to a set of 40 target LGE CMR images without annotation.

Cardiac Segmentation Data Augmentation +2

Neural Multi-Scale Self-Supervised Registration for Echocardiogram Dense Tracking

no code implementations18 Jun 2019 Wentao Zhu, Yufang Huang, Mani A. Vannan, Shizhen Liu, Daguang Xu, Wei Fan, Zhen Qian, Xiaohui Xie

In this work, we propose a neural multi-scale self-supervised registration (NMSR) method for automated myocardial and cardiac blood flow dense tracking.

Deep Learning for Automated Medical Image Analysis

no code implementations12 Mar 2019 Wentao Zhu

Second, we will demonstrate how to use the weakly labeled data for the mammogram breast cancer diagnosis by efficiently design deep learning for multi-instance learning.

Anatomy Lung Nodule Detection

AnatomyNet: Deep Learning for Fast and Fully Automated Whole-volume Segmentation of Head and Neck Anatomy

2 code implementations15 Aug 2018 Wentao Zhu, Yufang Huang, Liang Zeng, Xuming Chen, Yong liu, Zhen Qian, Nan Du, Wei Fan, Xiaohui Xie

Methods: Our deep learning model, called AnatomyNet, segments OARs from head and neck CT images in an end-to-end fashion, receiving whole-volume HaN CT images as input and generating masks of all OARs of interest in one shot.

3D Medical Imaging Segmentation Anatomy

DeepLung: Deep 3D Dual Path Nets for Automated Pulmonary Nodule Detection and Classification

2 code implementations25 Jan 2018 Wentao Zhu, Chaochun Liu, Wei Fan, Xiaohui Xie

DeepLung consists of two components, nodule detection (identifying the locations of candidate nodules) and classification (classifying candidate nodules into benign or malignant).

Classification Computed Tomography (CT) +2

Deep Multi-instance Networks with Sparse Label Assignment for Whole Mammogram Classification

1 code implementation23 May 2017 Wentao Zhu, Qi Lou, Yeeleng Scott Vang, Xiaohui Xie

Inspired by the success of using deep convolutional features for natural image analysis and multi-instance learning (MIL) for labeling a set of instances/patches, we propose end-to-end trained deep multi-instance networks for mass classification based on whole mammogram without the aforementioned ROIs.

Classification General Classification +2

Leak Event Identification in Water Systems Using High Order CRF

no code implementations12 Mar 2017 Qing Han, Wentao Zhu, Yang Shi

Today, detection of anomalous events in civil infrastructures (e. g. water pipe breaks and leaks) is time consuming and often takes hours or days.

Vocal Bursts Intensity Prediction

Deep Multi-instance Networks with Sparse Label Assignment for Whole Mammogram Classification

no code implementations18 Dec 2016 Wentao Zhu, Qi Lou, Yeeleng Scott Vang, Xiaohui Xie

Inspired by the success of using deep convolutional features for natural image analysis and multi-instance learning for labeling a set of instances/patches, we propose end-to-end trained deep multi-instance networks for mass classification based on whole mammogram without the aforementioned costly need to annotate the training data.

Classification General Classification +1

Adversarial Deep Structural Networks for Mammographic Mass Segmentation

1 code implementation18 Dec 2016 Wentao Zhu, Xiang Xiang, Trac. D. Tran, Xiaohui Xie

Experimental results on two public datasets, INbreast and DDSM-BCRP, show that our end-to-end network combined with adversarial training achieves the-state-of-the-art results.

Position Segmentation

Co-occurrence Feature Learning for Skeleton based Action Recognition using Regularized Deep LSTM Networks

no code implementations24 Mar 2016 Wentao Zhu, Cuiling Lan, Junliang Xing, Wen-Jun Zeng, Yanghao Li, Li Shen, Xiaohui Xie

Skeleton based action recognition distinguishes human actions using the trajectories of skeleton joints, which provide a very good representation for describing actions.

Action Recognition Skeleton Based Action Recognition +1

Deep Trans-layer Unsupervised Networks for Representation Learning

no code implementations27 Sep 2015 Wentao Zhu, Jun Miao, Laiyun Qing, Xilin Chen

Compared to traditional deep learning methods, the implemented feature learning method has much less parameters and is validated in several typical experiments, such as digit recognition on MNIST and MNIST variations, object recognition on Caltech 101 dataset and face verification on LFW dataset.

Face Verification Object Recognition +1

Constrained Extreme Learning Machines: A Study on Classification Cases

1 code implementation25 Jan 2015 Wentao Zhu, Jun Miao, Laiyun Qing

Extreme learning machine (ELM) is an extremely fast learning method and has a powerful performance for pattern recognition tasks proven by enormous researches and engineers.

Classification General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.