Search Results for author: Zhongdao Wang

Found 24 papers, 11 papers with code

OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving

no code implementations • 23 Apr 2024 • Guoqing Wang, Zhongdao Wang, Pin Tang, Jilai Zheng, Xiangxuan Ren, Bailan Feng, Chao Ma

Existing solutions for 3D semantic occupancy prediction typically treat the task as a one-shot 3D voxel-wise segmentation perception problem.

3D Semantic Occupancy Prediction Autonomous Driving +1

Paper
Add Code

SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction

no code implementations • 15 Apr 2024 • Pin Tang, Zhongdao Wang, Guoqing Wang, Jilai Zheng, Xiangxuan Ren, Bailan Feng, Chao Ma

Vision-based perception for autonomous driving requires an explicit modeling of a 3D space, where 2D latent representations are mapped and subsequent 3D operators are applied.

Autonomous Driving

Paper
Add Code

OccFiner: Offboard Occupancy Refinement with Hybrid Propagation

no code implementations • 13 Mar 2024 • Hao Shi, Song Wang, Jiaming Zhang, Xiaoting Yin, Zhongdao Wang, Zhijian Zhao, Guangming Wang, Jianke Zhu, Kailun Yang, Kaiwei Wang

Vision-based occupancy prediction, also known as 3D Semantic Scene Completion (SSC), presents a significant challenge in computer vision.

3D Semantic Scene Completion

Paper
Add Code

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

no code implementations • 7 Mar 2024 • Junsong Chen, Chongjian Ge, Enze Xie, Yue Wu, Lewei Yao, Xiaozhe Ren, Zhongdao Wang, Ping Luo, Huchuan Lu, Zhenguo Li

In this paper, we introduce PixArt-\Sigma, a Diffusion Transformer model~(DiT) capable of directly generating images at 4K resolution.

4k Image Captioning +1

Paper
Add Code

Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation

no code implementations • 28 Jan 2024 • Zhenyu Wang, Enze Xie, Aoxue Li, Zhongdao Wang, Xihui Liu, Zhenguo Li

Given a complex text prompt containing multiple concepts including objects, attributes, and relationships, the LLM agent initially decomposes it, which entails the extraction of individual objects, their associated attributes, and the prediction of a coherent scene layout.

Attribute Language Modelling +3

Paper
Add Code

PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

2 code implementations • 30 Sep 2023 • Junsong Chen, Jincheng Yu, Chongjian Ge, Lewei Yao, Enze Xie, Yue Wu, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, Zhenguo Li

We hope PIXART-$\alpha$ will provide new insights to the AIGC community and startups to accelerate building their own high-quality yet low-cost generative models from scratch.

Image Generation Language Modelling

2,192

Paper
Code

Identity-Seeking Self-Supervised Representation Learning for Generalizable Person Re-identification

1 code implementation • ICCV 2023 • Zhaopeng Dou, Zhongdao Wang, YaLi Li, Shengjin Wang

To overcome the barriers of data and annotation, we propose to utilize large-scale unsupervised data for training.

Generalizable Person Re-identification Representation Learning

Paper
Code

MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation

1 code implementation • 19 Apr 2023 • Chongjian Ge, Junsong Chen, Enze Xie, Zhongdao Wang, Lanqing Hong, Huchuan Lu, Zhenguo Li, Ping Luo

These queries are then processed iteratively by a BEV-Evolving decoder, which selectively aggregates deep features from either LiDAR, cameras, or both modalities.

3D Object Detection Autonomous Driving +3

Paper
Code

MetaBEV: Solving Sensor Failures for 3D Detection and Map Segmentation

no code implementations • ICCV 2023 • Chongjian Ge, Junsong Chen, Enze Xie, Zhongdao Wang, Lanqing Hong, Huchuan Lu, Zhenguo Li, Ping Luo

These queries are then processed iteratively by a BEV-Evolving decoder, which selectively aggregates deep features from either LiDAR, cameras, or both modalities.

3D Object Detection Autonomous Driving +3

Paper
Add Code

Generalizable Re-Identification from Videos with Cycle Association

no code implementations • 7 Nov 2022 • Zhongdao Wang, Zhaopeng Dou, Jingwei Zhang, Liang Zheng, Yifan Sun, YaLi Li, Shengjin Wang

In this paper, we are interested in learning a generalizable person re-identification (re-ID) representation from unlabeled videos.

Domain Generalization Generalizable Person Re-identification +1

Paper
Add Code

Reliability-Aware Prediction via Uncertainty Learning for Person Image Retrieval

1 code implementation • 24 Oct 2022 • Zhaopeng Dou, Zhongdao Wang, Weihua Chen, YaLi Li, Shengjin Wang

(3) the data uncertainty and the model uncertainty are jointly learned in a unified network, and they serve as two fundamental criteria for the reliability assessment: if a probe is high-quality (low data uncertainty) and the model is confident in the prediction of the probe (low model uncertainty), the final ranking will be assessed as reliable.

Image Retrieval Retrieval

Paper
Code

Self-Supervised Learning via Maximum Entropy Coding

1 code implementation • 20 Oct 2022 • Xin Liu, Zhongdao Wang, YaLi Li, Shengjin Wang

To cope with this issue, we propose Maximum Entropy Coding (MEC), a more principled objective that explicitly optimizes on the structure of the representation, so that the learned representation is less biased and thus generalizes better to unseen downstream tasks.

Instance Segmentation object-detection +4

Paper
Code

Adaptive Affinity for Associations in Multi-Target Multi-Camera Tracking

no code implementations • 14 Dec 2021 • Yunzhong Hou, Zhongdao Wang, Shengjin Wang, Liang Zheng

In this paper, we design experiments to verify such misfit between global re-ID feature distances and local matching in tracking, and propose a simple yet effective approach to adapt affinity estimations to corresponding matching scopes in MTMCT.

Paper
Add Code

How to Synthesize a Large-Scale and Trainable Micro-Expression Dataset?

1 code implementation • 3 Dec 2021 • Yuchi Liu, Zhongdao Wang, Tom Gedeon, Liang Zheng

To this end, we develop a protocol to automatically synthesize large scale MiE training data that allow us to train improved recognition models for real-world test data.

Face Generation Micro-Expression Recognition

Paper
Code

Do Different Tracking Tasks Require Different Appearance Models?

1 code implementation • NeurIPS 2021 • Zhongdao Wang, Hengshuang Zhao, Ya-Li Li, Shengjin Wang, Philip H. S. Torr, Luca Bertinetto

We show how most tracking tasks can be solved within this framework, and that the same appearance model can be successfully used to obtain results that are competitive against specialised methods for most of the tasks considered.

Ranked #2 on Video Object Segmentation on DAVIS 2017 (mIoU metric)

Multi-Object Tracking Multi-Object Tracking and Segmentation +10

336

Paper
Code

Synthetic Data Are as Good as the Real for Association Knowledge Learning in Multi-object Tracking

no code implementations • 30 Jun 2021 • Yuchi Liu, Zhongdao Wang, Xiangxin Zhou, Liang Zheng

We show that compared with real data, association knowledge obtained from synthetic data can achieve very similar performance on real-world test sets without domain adaption techniques.

Domain Adaptation Multi-Object Tracking

Paper
Add Code

CycAs: Self-supervised Cycle Association for Learning Re-identifiable Descriptions

no code implementations • ECCV 2020 • Zhongdao Wang, Jingwei Zhang, Liang Zheng, Yixuan Liu, Yifan Sun, Ya-Li Li, Shengjin Wang

This paper proposes a self-supervised learning method for the person re-identification (re-ID) problem, where existing unsupervised methods usually rely on pseudo labels, such as those from video tracklets or clustering.

Clustering Multi-Object Tracking +2

Paper
Add Code

Circle Loss: A Unified Perspective of Pair Similarity Optimization

12 code implementations • CVPR 2020 • Yifan Sun, Changmao Cheng, Yuhan Zhang, Chi Zhang, Liang Zheng, Zhongdao Wang, Yichen Wei

This paper provides a pair similarity optimization viewpoint on deep feature learning, aiming to maximize the within-class similarity $s_p$ and minimize the between-class similarity $s_n$.

Ranked #1 on Face Verification on IJB-C (training dataset metric)

Face Recognition Face Verification +4

3,955

Paper
Code

Locality Aware Appearance Metric for Multi-Target Multi-Camera Tracking

1 code implementation • 27 Nov 2019 • Yunzhong Hou, Liang Zheng, Zhongdao Wang, Shengjin Wang

Due to the continuity of target trajectories, tracking systems usually restrict their data association within a local neighborhood.

Multi-Object Tracking

Paper
Code

Towards Real-Time Multi-Object Tracking

12 code implementations • ECCV 2020 • Zhongdao Wang, Liang Zheng, Yixuan Liu, Ya-Li Li, Shengjin Wang

In this paper, we propose an MOT system that allows target detection and appearance embedding to be learned in a shared model.

Ranked #4 on Multi-Object Tracking on HiEve

Multiple Object Tracking Multi-Task Learning +2

12,085

Paper
Code

Softmax Dissection: Towards Understanding Intra- and Inter-class Objective for Embedding Learning

no code implementations • 4 Aug 2019 • Lanqing He, Zhongdao Wang, Ya-Li Li, Shengjin Wang

The softmax loss and its variants are widely used as objectives for embedding learning, especially in applications like face recognition.

Face Recognition Face Verification

Paper
Add Code

Linkage Based Face Clustering via Graph Convolution Network

4 code implementations • CVPR 2019 • Zhongdao Wang, Liang Zheng, Ya-Li Li, Shengjin Wang

The key idea is that we find the local context in the feature space around an instance (face) contains rich information about the linkage relationship between this instance and its neighbors.

Clustering Face Clustering +1

699

Paper
Code

Query Adaptive Late Fusion for Image Retrieval

no code implementations • 31 Oct 2018 • Zhongdao Wang, Liang Zheng, Shengjin Wang

That is to say, for some queries, a feature may be neither discriminative nor complementary to existing ones, while for other queries, the feature suffices.

Image Retrieval Person Recognition +2

Paper
Add Code

Orientation Invariant Feature Embedding and Spatial Temporal Regularization for Vehicle Re-Identification

no code implementations • ICCV 2017 • Zhongdao Wang, Luming Tang, Xihui Liu, Zhuliang Yao, Shuai Yi, Jing Shao, Junjie Yan, Shengjin Wang, Hongsheng Li, Xiaogang Wang

In our vehicle ReID framework, an orientation invariant feature embedding module and a spatial-temporal regularization module are proposed.

Retrieval Vehicle Re-Identification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.