Search Results for author: Zheng Zhu

Found 87 papers, 52 papers with code

DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation

no code implementations • 11 Mar 2024 • Guosheng Zhao, XiaoFeng Wang, Zheng Zhu, Xinze Chen, Guan Huang, Xiaoyi Bao, Xingang Wang

DriveDreamer-2 is the first world model to generate customized driving videos, it can generate uncommon driving videos (e. g., vehicles abruptly cut in) in a user-friendly manner.

Autonomous Driving Language Modelling +2

Paper
Add Code

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

no code implementations • 18 Jan 2024 • XiaoFeng Wang, Zheng Zhu, Guan Huang, Boyuan Wang, Xinze Chen, Jiwen Lu

World models play a crucial role in understanding and predicting the dynamics of the world, which is essential for video generation.

Video Editing Video Generation

Paper
Add Code

Generative Pretraining at Scale: Transformer-Based Encoding of Transactional Behavior for Fraud Detection

no code implementations • 22 Dec 2023 • Ze Yu Zhao, Zheng Zhu, Guilin Li, Wenhan Wang, Bo wang

In this work, we introduce an innovative autoregressive model leveraging Generative Pretrained Transformer (GPT) architectures, tailored for fraud detection in payment systems.

Anomaly Detection Fraud Detection

Paper
Add Code

OpenStereo: A Comprehensive Benchmark for Stereo Matching and Strong Baseline

1 code implementation • 1 Dec 2023 • Xianda Guo, Juntao Lu, Chenming Zhang, Yiqi Wang, Yiqun Duan, Tian Yang, Zheng Zhu, Long Chen

Based on OpenStereo, we conducted experiments and have achieved or surpassed the performance metrics reported in the original paper.

Autonomous Driving Autonomous Navigation +1

252

Paper
Code

On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving

1 code implementation • 9 Nov 2023 • Licheng Wen, Xuemeng Yang, Daocheng Fu, XiaoFeng Wang, Pinlong Cai, Xin Li, Tao Ma, Yingxuan Li, Linran Xu, Dengke Shang, Zheng Zhu, Shaoyan Sun, Yeqi Bai, Xinyu Cai, Min Dou, Shuanglu Hu, Botian Shi, Yu Qiao

This has been a significant bottleneck, particularly in the development of common sense reasoning and nuanced scene understanding necessary for safe and reliable autonomous driving.

Autonomous Driving Common Sense Reasoning +4

264

Paper
Code

DREAM+: Efficient Dataset Distillation by Bidirectional Representative Matching

1 code implementation • 23 Oct 2023 • Yanqing Liu, Jianyang Gu, Kai Wang, Zheng Zhu, Kaipeng Zhang, Wei Jiang, Yang You

Dataset distillation plays a crucial role in creating compact datasets with similar training performance compared with original large-scale ones.

Transfer Learning

Paper
Code

DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving

no code implementations • 18 Sep 2023 • XiaoFeng Wang, Zheng Zhu, Guan Huang, Xinze Chen, Jiagang Zhu, Jiwen Lu

The established world model holds immense potential for the generation of high-quality driving videos, and driving policies for safe maneuvering.

Autonomous Driving Video Generation

Paper
Add Code

Introspective Deep Metric Learning

2 code implementations • 11 Sep 2023 • Chengkun Wang, Wenzhao Zheng, Zheng Zhu, Jie zhou, Jiwen Lu

This paper proposes an introspective deep metric learning (IDML) framework for uncertainty-aware comparisons of images.

Image Retrieval Metric Learning

Paper
Code

Unified Single-Stage Transformer Network for Efficient RGB-T Tracking

1 code implementation • 26 Aug 2023 • Jianqiang Xia, Dianxi Shi, Ke Song, Linna Song, Xiaolei Wang, Songchang Jin, Li Zhou, Yu Cheng, Lei Jin, Zheng Zhu, Jianan Li, Gang Wang, Junliang Xing, Jian Zhao

With this structure, the network can extract fusion features of the template and search region under the mutual interaction of modalities.

Ranked #1 on Rgb-T Tracking on GTOT

feature selection Rgb-T Tracking

Paper
Code

Evidential Detection and Tracking Collaboration: New Problem, Benchmark and Algorithm for Robust Anti-UAV System

1 code implementation • 27 Jun 2023 • Xue-Feng Zhu, Tianyang Xu, Jian Zhao, Jia-Wei Liu, Kai Wang, Gang Wang, Jianan Li, Qiang Wang, Lei Jin, Zheng Zhu, Junliang Xing, Xiao-Jun Wu

Still, previous works have simplified such an anti-UAV task as a tracking problem, where the prior information of UAVs is always provided; such a scheme fails in real-world anti-UAV tasks (i. e. complex scenes, indeterminate-appear and -reappear UAVs, and real-time UAV surveillance).

210

Paper
Code

One at a Time: Progressive Multi-step Volumetric Probability Learning for Reliable 3D Scene Perception

no code implementations • 22 Jun 2023 • Bohan Li, Yasheng Sun, Jingxin Dong, Zheng Zhu, Jinming Liu, Xin Jin, Wenjun Zeng

Numerous studies have investigated the pivotal role of reliable 3D volume representation in scene perception tasks, such as multi-view stereo (MVS) and semantic scene completion (SSC).

Depth Estimation Representation Learning

Paper
Add Code

The 3rd Anti-UAV Workshop & Challenge: Methods and Results

no code implementations • 12 May 2023 • Jian Zhao, Jianan Li, Lei Jin, Jiaming Chu, Zhihao Zhang, Jun Wang, Jiangqiang Xia, Kai Wang, Yang Liu, Sadaf Gulshad, Jiaojiao Zhao, Tianyang Xu, XueFeng Zhu, Shihan Liu, Zheng Zhu, Guibo Zhu, Zechao Li, Zheng Wang, Baigui Sun, Yandong Guo, Shin ichi Satoh, Junliang Xing, Jane Shen Shengmei

Second, we set up two tracks for the first time, i. e., Anti-UAV Tracking and Anti-UAV Detection & Tracking.

Object Tracking

Paper
Add Code

Multi-Prompt with Depth Partitioned Cross-Modal Learning

1 code implementation • 10 May 2023 • Yingjie Tian, Yiqi Wang, Xianda Guo, Zheng Zhu, Long Chen

In recent years, soft prompt learning methods have been proposed to fine-tune large-scale vision-language pre-trained models for various downstream tasks.

Domain Generalization

Paper
Code

OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction

1 code implementation • ICCV 2023 • Yunpeng Zhang, Zheng Zhu, Dalong Du

The vision-based perception for autonomous driving has undergone a transformation from the bird-eye-view (BEV) representations to the 3D semantic occupancy.

Ranked #3 on 3D Semantic Scene Completion from a single RGB image on SemanticKITTI

3D Semantic Occupancy Prediction 3D Semantic Scene Completion from a single RGB image +3

285

Paper
Code

DyGait: Exploiting Dynamic Representations for High-performance Gait Recognition

no code implementations • ICCV 2023 • Ming Wang, Xianda Guo, Beibei Lin, Tian Yang, Zheng Zhu, Lincheng Li, Shunli Zhang, Xin Yu

This is the first framework on gait recognition that is designed to focus on the extraction of dynamic features.

Gait Recognition Vocal Bursts Intensity Prediction

Paper
Add Code

SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving

2 code implementations • ICCV 2023 • Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Jie zhou, Jiwen Lu

Towards a more comprehensive perception of a 3D scene, in this paper, we propose a SurroundOcc method to predict the 3D occupancy with multi-camera images.

3D Object Detection Autonomous Driving +2

680

Paper
Code

DiffBEV: Conditional Diffusion Model for Bird's Eye View Perception

1 code implementation • 15 Mar 2023 • Jiayu Zou, Zheng Zhu, Yun Ye, Xingang Wang

Diffusion models naturally have the ability to denoise noisy samples to the ideal data, which motivates us to utilize the diffusion model to get a better BEV representation.

3D Object Detection Autonomous Driving +3

230

Paper
Code

A Simple Baseline for Supervised Surround-view Depth Estimation

no code implementations • 14 Mar 2023 • Xianda Guo, Wenjie Yuan, Yunpeng Zhang, Tian Yang, Chenming Zhang, Zheng Zhu, Long Chen

The former is achieved by the self-attention module within each view, while the latter is realized by the adjacent attention module, which computes the attention across multi-cameras to exchange the multi-scale representations across surround-view feature maps.

Autonomous Driving Monocular Depth Estimation

Paper
Add Code

DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation

1 code implementation • 9 Mar 2023 • Yiqun Duan, Xianda Guo, Zheng Zhu

We propose DiffusionDepth, a new approach that reformulates monocular depth estimation as a denoising diffusion process.

Denoising Monocular Depth Estimation

246

Paper
Code

DiM: Distilling Dataset into Generative Model

2 code implementations • 8 Mar 2023 • Kai Wang, Jianyang Gu, Daquan Zhou, Zheng Zhu, Wei Jiang, Yang You

To the best of our knowledge, we are the first to achieve higher accuracy on complex architectures than simple ones, such as 75. 1\% with ResNet-18 and 72. 6\% with ConvNet-3 on ten images per class of CIFAR-10.

1,164

Paper
Code

OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception

1 code implementation • ICCV 2023 • XiaoFeng Wang, Zheng Zhu, Wenbo Xu, Yunpeng Zhang, Yi Wei, Xu Chi, Yun Ye, Dalong Du, Jiwen Lu, Xingang Wang

Towards a comprehensive benchmarking of surrounding perception algorithms, we propose OpenOccupancy, which is the first surrounding semantic occupancy perception benchmark.

Autonomous Driving Benchmarking

522

Paper
Code

DREAM: Efficient Dataset Distillation by Representative Matching

2 code implementations • ICCV 2023 • Yanqing Liu, Jianyang Gu, Kai Wang, Zheng Zhu, Wei Jiang, Yang You

Although there are various matching objectives, currently the strategy for selecting original images is limited to naive random sampling.

1,164

Paper
Code

DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation

1 code implementation • CVPR 2023 • Shuai Shen, Wenliang Zhao, Zibin Meng, Wanhua Li, Zheng Zhu, Jie zhou, Jiwen Lu

In this way, the proposed DiffTalk is capable of producing high-quality talking head videos in synchronization with the source audio, and more importantly, it can be naturally generalized across different identities without any further fine-tuning.

Denoising Talking Head Generation

403

Paper
Code

Detachable Novel Views Synthesis of Dynamic Scenes Using Distribution-Driven Neural Radiance Fields

1 code implementation • 1 Jan 2023 • Boyu Zhang, Wenbo Xu, Zheng Zhu, Guan Huang

Specifically, it employs a neural representation to capture the scene distribution in the static background and a 6D-input NeRF to represent dynamic objects, respectively.

Autonomous Driving

Paper
Code

Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark

1 code implementation • CVPR 2023 • XiaoFeng Wang, Zheng Zhu, Yunpeng Zhang, Guan Huang, Yun Ye, Wenbo Xu, Ziwei Chen, Xingang Wang

To mitigate the problem, we propose the Autonomous-driving StreAming Perception (ASAP) benchmark, which is the first benchmark to evaluate the online performance of vision-centric perception in autonomous driving.

Depth Estimation Motion Forecasting

Paper
Code

Token-Label Alignment for Vision Transformers

1 code implementation • ICCV 2023 • Han Xiao, Wenzhao Zheng, Zheng Zhu, Jie zhou, Jiwen Lu

Data mixing strategies (e. g., CutMix) have shown the ability to greatly improve the performance of convolutional neural networks (CNNs).

Image Classification Semantic Segmentation +1

Paper
Code

OPERA: Omni-Supervised Representation Learning with Hierarchical Supervisions

1 code implementation • ICCV 2023 • Chengkun Wang, Wenzhao Zheng, Zheng Zhu, Jie zhou, Jiwen Lu

The pretrain-finetune paradigm in modern computer vision facilitates the success of self-supervised learning, which tends to achieve better transferability than supervised learning.

Image Classification object-detection +3

Paper
Code

A Simple Baseline for Multi-Camera 3D Object Detection

1 code implementation • 22 Aug 2022 • Yunpeng Zhang, Wenzhao Zheng, Zheng Zhu, Guan Huang, Jie zhou, Jiwen Lu

First, we extract multi-scale features and generate the perspective object proposals on each monocular image.

Autonomous Driving Monocular 3D Object Detection +2

Paper
Code

Crafting Monocular Cues and Velocity Guidance for Self-Supervised Multi-Frame Depth Learning

1 code implementation • 19 Aug 2022 • XiaoFeng Wang, Zheng Zhu, Guan Huang, Xu Chi, Yun Ye, Ziwei Chen, Xingang Wang

In contrast, multi-frame depth estimation methods improve the depth accuracy thanks to the success of Multi-View Stereo (MVS), which directly makes use of geometric constraints.

Depth Estimation

Paper
Code

MonoViT: Self-Supervised Monocular Depth Estimation with a Vision Transformer

1 code implementation • 6 Aug 2022 • Chaoqiang Zhao, Youmin Zhang, Matteo Poggi, Fabio Tosi, Xianda Guo, Zheng Zhu, Guan Huang, Yang Tang, Stefano Mattoccia

Self-supervised monocular depth estimation is an attractive solution that does not require hard-to-source depth labels for training.

Ranked #1 on Monocular Depth Estimation on KITTI

Depth Prediction Monocular Depth Estimation +1

140

Paper
Code

Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis

1 code implementation • 24 Jul 2022 • Shuai Shen, Wanhua Li, Zheng Zhu, Yueqi Duan, Jie zhou, Jiwen Lu

Thus the facial radiance field can be flexibly adjusted to the new identity with few reference images.

Talking Face Generation Talking Head Generation

329

Paper
Code

Shapley-NAS: Discovering Operation Contribution for Neural Architecture Search

1 code implementation • CVPR 2022 • Han Xiao, Ziwei Wang, Zheng Zhu, Jie zhou, Jiwen Lu

Differentiable architecture search (DARTS) acquires the optimal architectures by optimizing the architecture parameters with gradient descent, which significantly reduces the search cost.

Ranked #1 on Neural Architecture Search on NAS-Bench-201, CIFAR-100

Neural Architecture Search

Paper
Code

OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression

1 code implementation • 6 Jun 2022 • Wanhua Li, Xiaoke Huang, Zheng Zhu, Yansong Tang, Xiu Li, Jie zhou, Jiwen Lu

In this paper, we propose to learn the rank concepts from the rich semantic CLIP latent space.

Ranked #1 on Few-shot Age Estimation on MORPH Album2

Aesthetics Quality Assessment Few-shot Age Estimation +4

Paper
Code

Divide to Adapt: Mitigating Confirmation Bias for Domain Adaptation of Black-Box Predictors

1 code implementation • 28 May 2022 • Jianfei Yang, Xiangyu Peng, Kai Wang, Zheng Zhu, Jiashi Feng, Lihua Xie, Yang You

Domain Adaptation of Black-box Predictors (DABP) aims to learn a model on an unlabeled target domain supervised by a black-box predictor trained on a source domain.

Domain Adaptation Knowledge Distillation

Paper
Code

FaceMAE: Privacy-Preserving Face Recognition via Masked Autoencoders

1 code implementation • 23 May 2022 • Kai Wang, Bo Zhao, Xiangyu Peng, Zheng Zhu, Jiankang Deng, Xinchao Wang, Hakan Bilen, Yang You

Firstly, randomly masked face images are used to train the reconstruction module in FaceMAE.

Face Recognition Privacy Preserving +1

Paper
Code

BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving

1 code implementation • 19 May 2022 • Yunpeng Zhang, Zheng Zhu, Wenzhao Zheng, JunJie Huang, Guan Huang, Jie zhou, Jiwen Lu

Specifically, BEVerse first performs shared feature extraction and lifting to generate 4D BEV representations from multi-timestamp and multi-view images.

Ranked #15 on Robust Camera Only 3D Object Detection on nuScenes-C

3D Object Detection Autonomous Driving +4

367

Paper
Code

Gait Recognition in the Wild: A Large-scale Benchmark and NAS-based Baseline

no code implementations • ICCV 2021 • Xianda Guo, Zheng Zhu, Tian Yang, Beibei Lin, JunJie Huang, Jiankang Deng, Guan Huang, Jie zhou, Jiwen Lu

To the best of our knowledge, this is the first large-scale dataset for gait recognition in the wild.

Gait Recognition in the Wild Neural Architecture Search

Paper
Add Code

Reliable Label Correction is a Good Booster When Learning with Extremely Noisy Labels

1 code implementation • 30 Apr 2022 • Kai Wang, Xiangyu Peng, Shuo Yang, Jianfei Yang, Zheng Zhu, Xinchao Wang, Yang You

This paradigm, however, is prone to significant degeneration under heavy label noise, as the number of clean samples is too small for conventional methods to behave well.

Learning with noisy labels

Paper
Code

WebFace260M: A Benchmark for Million-Scale Deep Face Recognition

no code implementations • 21 Apr 2022 • Zheng Zhu, Guan Huang, Jiankang Deng, Yun Ye, JunJie Huang, Xinze Chen, Jiagang Zhu, Tian Yang, Dalong Du, Jiwen Lu, Jie zhou

For a comprehensive evaluation of face matchers, three recognition tasks are performed under standard, masked and unbiased settings, respectively.

Face Recognition

Paper
Add Code

MVSTER: Epipolar Transformer for Efficient Multi-View Stereo

1 code implementation • 15 Apr 2022 • XiaoFeng Wang, Zheng Zhu, Fangbo Qin, Yun Ye, Guan Huang, Xu Chi, Yijia He, Xingang Wang

Therefore, we present MVSTER, which leverages the proposed epipolar Transformer to learn both 2D semantics and 3D spatial associations efficiently.

178

Paper
Code

HFT: Lifting Perspective Representations via Hybrid Feature Transformation

1 code implementation • 11 Apr 2022 • Jiayu Zou, Junrui Xiao, Zheng Zhu, JunJie Huang, Guan Huang, Dalong Du, Xingang Wang

In order to reap the benefits and avoid the drawbacks of CBFT and CFFT, we propose a novel framework with a Hybrid Feature Transformation module (HFT).

Autonomous Driving Decision Making +2

120

Paper
Code

SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation

1 code implementation • 7 Apr 2022 • Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Yongming Rao, Guan Huang, Jiwen Lu, Jie zhou

In this paper, we propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.

Autonomous Driving Monocular Depth Estimation

236

Paper
Code

Decoupled Multi-task Learning with Cyclical Self-Regulation for Face Parsing

1 code implementation • CVPR 2022 • Qingping Zheng, Jiankang Deng, Zheng Zhu, Ying Li, Stefanos Zafeiriou

Specifically, DML-CSR designs a multi-task model which comprises face parsing, binary edge, and category edge detection.

Ranked #1 on Face Parsing on Helen

Edge Detection Face Parsing +1

21,275

Paper
Code

GaitStrip: Gait Recognition via Effective Strip-based Feature Representations and Multi-Level Framework

1 code implementation • 8 Mar 2022 • Ming Wang, Beibei Lin, Xianda Guo, Lincheng Li, Zheng Zhu, Jiande Sun, Shunli Zhang, Xin Yu

ECM consists of the Spatial-Temporal feature extractor (ST), the Frame-Level feature extractor (FL) and SPB, and has two obvious advantages: First, each branch focuses on a specific representation, which can be used to improve the robustness of the network.

Gait Recognition

Paper
Code

CAFE: Learning to Condense Dataset by Aligning Features

2 code implementations • CVPR 2022 • Kai Wang, Bo Zhao, Xiangyu Peng, Zheng Zhu, Shuo Yang, Shuo Wang, Guan Huang, Hakan Bilen, Xinchao Wang, Yang You

Dataset condensation aims at reducing the network training effort through condensing a cumbersome training set into a compact synthetic one.

Dataset Condensation

1,164

Paper
Code

Crafting Better Contrastive Views for Siamese Representation Learning

1 code implementation • CVPR 2022 • Xiangyu Peng, Kai Wang, Zheng Zhu, Mang Wang, Yang You

For high performance Siamese representation learning, one of the keys is to design good contrastive pairs.

Contrastive Learning Object Localization +1

278

Paper
Code

Dimension Embeddings for Monocular 3D Object Detection

no code implementations • CVPR 2022 • Yunpeng Zhang, Wenzhao Zheng, Zheng Zhu, Guan Huang, Dalong Du, Jie zhou, Jiwen Lu

In this paper, we propose a general method to learn appropriate embeddings for dimension estimation in monocular 3D object detection.

Monocular 3D Object Detection Object +1

Paper
Add Code

BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View

2 code implementations • 22 Dec 2021 • JunJie Huang, Guan Huang, Zheng Zhu, Yun Ye, Dalong Du

As a fast version, BEVDet-Tiny scores 31. 2% mAP and 39. 2% NDS on the nuScenes val set.

Ranked #20 on Robust Camera Only 3D Object Detection on nuScenes-C

3D Object Detection Autonomous Driving +3

1,267

Paper
Code

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

1 code implementation • CVPR 2022 • Yongming Rao, Wenliang Zhao, Guangyi Chen, Yansong Tang, Zheng Zhu, Guan Huang, Jie zhou, Jiwen Lu

In this work, we present a new framework for dense prediction by implicitly and explicitly leveraging the pre-trained knowledge from CLIP.

Image-text matching Instance Segmentation +6

489

Paper
Code

Face-NMS: A Core-set Selection Approach for Efficient Face Recognition

no code implementations • 10 Sep 2021 • Yunze Chen, JunJie Huang, Jiagang Zhu, Zheng Zhu, Tian Yang, Guan Huang, Dalong Du

The current research on this problem mainly focuses on designing an efficient Fully-connected layer (FC) to reduce GPU memory consumption caused by a large number of identities.

Face Recognition object-detection +1

Paper
Add Code

Masked Face Recognition Challenge: The InsightFace Track Report

1 code implementation • 18 Aug 2021 • Jiankang Deng, Jia Guo, Xiang An, Zheng Zhu, Stefanos Zafeiriou

In this workshop, we organize Masked Face Recognition (MFR) challenge and focus on bench-marking deep face recognition methods under the existence of facial masks.

Face Recognition

21,275

Paper
Code

Masked Face Recognition Challenge: The WebFace260M Track Report

no code implementations • 16 Aug 2021 • Zheng Zhu, Guan Huang, Jiankang Deng, Yun Ye, JunJie Huang, Xinze Chen, Jiagang Zhu, Tian Yang, Jia Guo, Jiwen Lu, Dalong Du, Jie zhou

There are second phase of the challenge till October 1, 2021 and on-going leaderboard.

Face Recognition

Paper
Add Code

Global Filter Networks for Image Classification

4 code implementations • NeurIPS 2021 • Yongming Rao, Wenliang Zhao, Zheng Zhu, Jiwen Lu, Jie zhou

Recent advances in self-attention and pure multi-layer perceptrons (MLP) models for vision have shown great potential in achieving promising performance with fewer inductive biases.

Ranked #9 on Image Classification on Stanford Cars (using extra training data)

Classification Domain Generalization +1

391

Paper
Code

Structure-Aware Face Clustering on a Large-Scale Graph With 107 Nodes

1 code implementation • CVPR 2021 • Shuai Shen, Wanhua Li, Zheng Zhu, Guan Huang, Dalong Du, Jiwen Lu, Jie zhou

To address the dilemma of large-scale training and efficient inference, we propose the STructure-AwaRe Face Clustering (STAR-FC) method.

Clustering Face Clustering +1

Paper
Code

An Efficient Training Approach for Very Large Scale Face Recognition

1 code implementation • CVPR 2022 • Kai Wang, Shuo Wang, Panpan Zhang, Zhipeng Zhou, Zheng Zhu, Xiaobo Wang, Xiaojiang Peng, Baigui Sun, Hao Li, Yang You

This method adopts Dynamic Class Pool (DCP) for storing and updating the identities features dynamically, which could be regarded as a substitute for the FC layer.

Ranked #1 on Face Verification on IJB-C (training dataset metric)

Face Recognition Face Verification

Paper
Code

SIMPLE: SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation

no code implementations • 6 Apr 2021 • Jiabin Zhang, Zheng Zhu, Jiwen Lu, JunJie Huang, Guan Huang, Jie zhou

To make a better trade-off between accuracy and efficiency, we propose a novel multi-person pose estimation framework, SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation (SIMPLE).

Human Detection Multi-Person Pose Estimation

Paper
Add Code

Structure-Aware Face Clustering on a Large-Scale Graph with $\bf{10^{7}}$ Nodes

1 code implementation • 24 Mar 2021 • Shuai Shen, Wanhua Li, Zheng Zhu, Guan Huang, Dalong Du, Jiwen Lu, Jie zhou

To address the dilemma of large-scale training and efficient inference, we propose the STructure-AwaRe Face Clustering (STAR-FC) method.

Clustering Face Clustering +1

Paper
Code

WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition

no code implementations • CVPR 2021 • Zheng Zhu, Guan Huang, Jiankang Deng, Yun Ye, JunJie Huang, Xinze Chen, Jiagang Zhu, Tian Yang, Jiwen Lu, Dalong Du, Jie zhou

In this paper, we contribute a new million-scale face benchmark containing noisy 4M identities/260M faces (WebFace260M) and cleaned 2M identities/42M faces (WebFace42M) training data, as well as an elaborately designed time-constrained evaluation protocol.

Ranked #1 on Face Verification on IJB-C (training dataset metric)

Attribute Face Recognition +1

Paper
Add Code

Joint predictions of multi-modal ride-hailing demands: a deep multi-task multigraph learning-based approach

no code implementations • 11 Nov 2020 • Jintao Ke, Siyuan Feng, Zheng Zhu, Hai Yang, Jieping Ye

To address this issue, we propose a deep multi-task multi-graph learning approach, which combines two components: (1) multiple multi-graph convolutional (MGC) networks for predicting demands for different service modes, and (2) multi-task learning modules that enable knowledge sharing across multiple MGC networks.

Graph Learning Multi-Task Learning

Paper
Add Code

PiaNet: A pyramid input augmented convolutional neural network for GGO detection in 3D lung CT scans

no code implementations • 11 Sep 2020 • Weihua Liu, Xiabi Liua, Xiongbiao Luo, Murong Wang, Guanghui Han, Xinming Zhao, Zheng Zhu

In the first stage, the feature-extraction module is embedded into a classifier network that is trained on a large data set of GGO and non-GGO patches, which are generated by performing data augmentation from a small number of annotated CT scans.

Computed Tomography (CT) Data Augmentation +1

Paper
Add Code

AID: Pushing the Performance Boundary of Human Pose Estimation with Information Dropping Augmentation

2 code implementations • 17 Aug 2020 • Junjie Huang, Zheng Zhu, Guan Huang, Dalong Du

As AID successfully pushes the performance boundary of human pose estimation problem by considerable margin and sets a new state-of-the-art, we hope AID to be a regular configuration for training human pose estimators.

Ranked #1 on Multi-Person Pose Estimation on COCO minival

Multi-Person Pose Estimation

5,006

Paper
Code

The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation

3 code implementations • CVPR 2020 • Junjie Huang, Zheng Zhu, Feng Guo, Guan Huang, Dalong Du

Specifically, by investigating the standard data processing in state-of-the-art approaches mainly including coordinate system transformation and keypoint format transformation (i. e., encoding and decoding), we find that the results obtained by common flipping strategy are unaligned with the original ones in inference.

Ranked #14 on Pose Estimation on COCO test-dev

Pose Estimation

5,006

Paper
Code

Predicting origin-destination ride-sourcing demand with a spatio-temporal encoder-decoder residual multi-graph convolutional network

1 code implementation • 17 Oct 2019 • Jintao Ke, Xiaoran Qin, Hai Yang, Zhengfei Zheng, Zheng Zhu, Jieping Ye

To overcome this challenge, we propose the Spatio-Temporal Encoder-Decoder Residual Multi-Graph Convolutional network (ST-ED-RMGC), a novel deep learning model for predicting ride-sourcing demand of various OD pairs.

Management

Paper
Code

Multi-Stage HRNet: Multiple Stage High-Resolution Network for Human Pose Estimation

no code implementations • 14 Oct 2019 • Junjie Huang, Zheng Zhu, Guan Huang

Human pose estimation are of importance for visual understanding tasks such as action recognition and human-computer interaction.

Action Recognition Multi-Person Pose Estimation +1

Paper
Add Code

The Field-of-View Constraint of Markers for Mobile Robot with Pan-Tilt Camera

no code implementations • 24 Sep 2019 • Hongxuan Ma, Wei Zou, Zheng Zhu, Siyang Sun, Zhaobing Kang

In the field of navigation and visual servo, it is common to calculate relative pose by feature points on markers, so keeping markers in camera's view is an important problem.

Position

Paper
Add Code

EPOSIT: An Absolute Pose Estimation Method for Pinhole and Fish-Eye Cameras

1 code implementation • 19 Sep 2019 • Zhaobing Kang, Wei Zou, Zheng Zhu, Chi Zhang, Hongxuan Ma

This paper presents a generic 6DOF camera pose estimation method, which can be used for both the pinhole camera and the fish-eye camera.

Pose Estimation

Paper
Code

Human Following for Wheeled Robot with Monocular Pan-tilt Camera

no code implementations • 13 Sep 2019 • Zheng Zhu, Hongxuan Ma, Wei Zou

Human following on mobile robots has witnessed significant advances due to its potentials for real-world applications.

Optical Flow Estimation Visual Tracking

Paper
Add Code

High Performance Visual Object Tracking with Unified Convolutional Networks

no code implementations • 26 Aug 2019 • Zheng Zhu, Wei Zou, Guan Huang, Dalong Du, Chang Huang

In this paper, we propose an end-to-end framework to learn the convolutional features and perform the tracking process simultaneously, namely, a unified convolutional tracker (UCT).

Object Visual Object Tracking +1

Paper
Add Code

Camera Pose Correction in SLAM Based on Bias Values of Map Points

no code implementations • 24 Aug 2019 • Zhaobing Kang, Wei Zou, Zheng Zhu

Firstly, the relationship between the camera pose estimation error and bias values of map points is derived based on the optimized function in VSLAM.

feature selection Pose Estimation

Paper
Add Code

FastPose: Towards Real-time Pose Estimation and Tracking via Scale-normalized Multi-task Networks

no code implementations • 15 Aug 2019 • Jiabin Zhang, Zheng Zhu, Wei Zou, Peng Li, Yanwei Li, Hu Su, Guan Huang

Given the results of MTN, we adopt an occlusion-aware Re-ID feature strategy in the pose tracking module, where pose information is utilized to infer the occlusion state to make better use of Re-ID feature.

Human Detection Multi-Person Pose Estimation +3

Paper
Add Code

Exploiting Offset-guided Network for Pose Estimation and Tracking

no code implementations • 4 Jun 2019 • Rui Zhang, Zheng Zhu, Peng Li, Rui Wu, Chaoxu Guo, Guan Huang, Hailun Xia

Human pose estimation has witnessed a significant advance thanks to the development of deep learning.

Human Detection Pose Estimation +1

Paper
Add Code

State-aware Re-identification Feature for Multi-target Multi-camera Tracking

no code implementations • 4 Jun 2019 • Peng Li, Jiabin Zhang, Zheng Zhu, Yanwei Li, Lu Jiang, Guan Huang

Multi-target Multi-camera Tracking (MTMCT) aims to extract the trajectories from videos captured by a set of cameras.

Paper
Add Code

Action Machine: Rethinking Action Recognition in Trimmed Videos

no code implementations • 14 Dec 2018 • Jiagang Zhu, Wei Zou, Liang Xu, Yiming Hu, Zheng Zhu, Manyu Chang, Jun-Jie Huang, Guan Huang, Dalong Du

On NTU RGB-D, Action Machine achieves the state-of-the-art performance with top-1 accuracies of 97. 2% and 94. 3% on cross-view and cross-subject respectively.

Ranked #1 on Action Recognition on UTD-MHAD

Action Recognition Multimodal Activity Recognition +3

Paper
Add Code

Identity-Enhanced Network for Facial Expression Recognition

no code implementations • 11 Dec 2018 • Yanwei Li, Xingang Wang, Shilei Zhang, Lingxi Xie, Wenqi Wu, Hongyuan Yu, Zheng Zhu

Facial expression recognition is a challenging task, arguably because of large intra-class variations and high inter-class similarities.

Facial Expression Recognition Facial Expression Recognition (FER) +1

Paper
Add Code

Attention-guided Unified Network for Panoptic Segmentation

no code implementations • CVPR 2019 • Yanwei Li, Xinze Chen, Zheng Zhu, Lingxi Xie, Guan Huang, Dalong Du, Xingang Wang

This paper studies panoptic segmentation, a recently proposed task which segments foreground (FG) objects at the instance level as well as background (BG) contents at the semantic level.

Ranked #24 on Panoptic Segmentation on COCO test-dev

Panoptic Segmentation Segmentation

Paper
Add Code

Multi-hierarchical Independent Correlation Filters for Visual Tracking

1 code implementation • 26 Nov 2018 • Shuai Bai, Zhiqun He, Ting-Bing Xu, Zheng Zhu, Yuan Dong, Hongliang Bai

For visual tracking, most of the traditional correlation filters (CF) based methods suffer from the bottleneck of feature redundancy and lack of motion information.

Motion Estimation Visual Object Tracking +1

116

Paper
Code

An Efficient Optical Flow Based Motion Detection Method for Non-stationary Scenes

no code implementations • 18 Nov 2018 • Junjie Huang, Wei Zou, Zheng Zhu, Jiagang Zhu

Real-time motion detection in non-stationary scenes is a difficult task due to dynamic background, changing foreground appearance and limited computational resource.

Motion Detection Motion Detection In Non-Stationary Scenes +1

Paper
Add Code

Optical Flow Based Online Moving Foreground Analysis

no code implementations • 18 Nov 2018 • Junjie Huang, Wei Zou, Zheng Zhu, Jiagang Zhu

Obtained by moving object detection, the foreground mask result is unshaped and can not be directly used in most subsequent processes.

Clustering Moving Object Detection +2

Paper
Add Code

Distractor-aware Siamese Networks for Visual Object Tracking

1 code implementation • ECCV 2018 • Zheng Zhu, Qiang Wang, Bo Li, Wei Wu, Junjie Yan, Weiming Hu

During the off-line training phase, an effective sampling strategy is introduced to control this distribution and make the model focus on the semantic distractors.

Ranked #11 on Visual Object Tracking on VOT2017/18

Incremental Learning Object +2

1,253

Paper
Code

Optical Flow Based Real-time Moving Object Detection in Unconstrained Scenes

no code implementations • 13 Jul 2018 • Junjie Huang, Wei Zou, Jiagang Zhu, Zheng Zhu

Real-time moving object detection in unconstrained scenes is a difficult task due to dynamic background, changing foreground appearance and limited computational resource.

Moving Object Detection object-detection +1

Paper
Add Code

High Performance Visual Tracking With Siamese Region Proposal Network

5 code implementations • CVPR 2018 • Bo Li, Junjie Yan, Wei Wu, Zheng Zhu, Xiaolin Hu

Visual object tracking has been a fundamental topic in recent years and many deep learning based trackers have achieved state-of-the-art performance on multiple benchmarks.

Ranked #7 on Visual Object Tracking on VOT2017/18

Region Proposal Visual Object Tracking +2

1,253

Paper
Code

End-to-end Video-level Representation Learning for Action Recognition

1 code implementation • 11 Nov 2017 • Jiagang Zhu, Wei Zou, Zheng Zhu

From the frame/clip-level feature learning to the video-level representation building, deep learning methods in action recognition have developed rapidly in recent years.

Action Recognition Optical Flow Estimation +2

Paper
Code

UCT: Learning Unified Convolutional Networks for Real-time Visual Tracking

no code implementations • 10 Nov 2017 • Zheng Zhu, Guan Huang, Wei Zou, Dalong Du, Chang Huang

Convolutional neural networks (CNN) based tracking approaches have shown favorable performance in recent benchmarks.

Real-Time Visual Tracking

Paper
Add Code

End-to-end Flow Correlation Tracking with Spatial-temporal Attention

no code implementations • CVPR 2018 • Zheng Zhu, Wei Wu, Wei Zou, Junjie Yan

Discriminative correlation filters (DCF) with deep convolutional features have achieved favorable performance in recent tracking benchmarks.

Optical Flow Estimation

Paper
Add Code

Learning Gating ConvNet for Two-Stream based Methods in Action Recognition

1 code implementation • 12 Sep 2017 • Jiagang Zhu, Wei Zou, Zheng Zhu

For the two-stream style methods in action recognition, fusing the two streams' predictions is always by the weighted averaging scheme.

Action Classification Action Recognition +3

Paper
Code

Targeted Learning with Daily EHR Data

no code implementations • 27 May 2017 • Oleg Sofrygin, Zheng Zhu, Julie A Schmittdiel, Alyce S. Adams, Richard W. Grant, Mark J. Van Der Laan, Romain Neugebauer

Electronic health records (EHR) data provide a cost and time-effective opportunity to conduct cohort studies of the effects of multiple time-point interventions in the diverse patient population found in real-world clinical settings.

Paper
Add Code

Evolutionary Approaches to Optimization Problems in Chimera Topologies

no code implementations • 17 Aug 2016 • Roberto Santana, Zheng Zhu, Helmut G. Katzgraber

In this paper we investigate for the first time the use of Evolutionary Algorithms (EAs) on Ising spin glass instances defined on the Chimera topology.

Evolutionary Algorithms

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.