Search Results for author: Wanli Ouyang

Found 269 papers, 121 papers with code

Physical formula enhanced multi-task learning for pharmacokinetics prediction

no code implementations • 16 Apr 2024 • Ruifeng Li, Dongzhan Zhou, Ancheng Shen, Ao Zhang, Mao Su, Mingqian Li, Hongyang Chen, Gang Chen, Yin Zhang, Shufei Zhang, Yuqiang Li, Wanli Ouyang

Overall, our work illustrates the benefits and potential of using PEMAL in AIDD and other scenarios with data scarcity and noise.

Drug Discovery Multi-Task Learning

Paper
Add Code

Taming Stable Diffusion for Text to 360° Panorama Image Generation

2 code implementations • 11 Apr 2024 • Cheng Zhang, Qianyi Wu, Camilo Cruz Gambardella, Xiaoshui Huang, Dinh Phung, Wanli Ouyang, Jianfei Cai

Generative models, e. g., Stable Diffusion, have enabled the creation of photorealistic images from text prompts.

Denoising Image Generation

132

Paper
Code

How Much Data are Enough? Investigating Dataset Requirements for Patch-Based Brain MRI Segmentation Tasks

no code implementations • 4 Apr 2024 • Dongang Wang, Peilin Liu, Hengrui Wang, Heidi Beadnall, Kain Kyle, Linda Ly, Mariano Cabezas, Geng Zhan, Ryan Sullivan, Weidong Cai, Wanli Ouyang, Fernando Calamante, Michael Barnett, Chenyu Wang

This paper focuses on an early stage phase of deep learning research, prior to model development, and proposes a strategic framework for estimating the amount of annotated data required to train patch-based segmentation networks.

MRI segmentation

Paper
Add Code

RS-Mamba for Large Remote Sensing Image Dense Prediction

1 code implementation • 3 Apr 2024 • Sijie Zhao, Hao Chen, Xueliang Zhang, Pengfeng Xiao, Lei Bai, Wanli Ouyang

RSM is specifically designed to capture the global context of remote sensing images with linear complexity, facilitating the effective processing of large VHR images.

Ranked #1 on Road Segmentation on Massachusetts Roads Dataset (F1 metric)

Building change detection for remote sensing images Change Detection +1

107

Paper
Code

RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents

no code implementations • 28 Mar 2024 • Zeren Chen, Zhelun Shi, Xiaoya Lu, Lehan He, Sucheng Qian, Hao Shu Fang, Zhenfei Yin, Wanli Ouyang, Jing Shao, Yu Qiao, Cewu Lu, Lu Sheng

The ultimate goals of robotic learning is to acquire a comprehensive and generalizable robotic system capable of performing both seen skills within the training distribution and unseen skills in novel environments.

Motion Planning

Paper
Add Code

GVGEN: Text-to-3D Generation with Volumetric Representation

no code implementations • 19 Mar 2024 • Xianglong He, Junyi Chen, Sida Peng, Di Huang, Yangguang Li, Xiaoshui Huang, Chun Yuan, Wanli Ouyang, Tong He

To simplify the generation of GaussianVolume and empower the model to generate instances with detailed 3D geometry, we propose a coarse-to-fine pipeline.

3D Generation 3D Reconstruction +1

Paper
Add Code

DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM

no code implementations • 19 Mar 2024 • Yixuan Wu, Yizhou Wang, Shixiang Tang, Wenhao Wu, Tong He, Wanli Ouyang, Jian Wu, Philip Torr

We present DetToolChain, a novel prompting paradigm, to unleash the zero-shot object detection ability of multimodal large language models (MLLMs), such as GPT-4V and Gemini.

Object object-detection +3

Paper
Add Code

HVDistill: Transferring Knowledge from Images to Point Clouds via Unsupervised Hybrid-View Distillation

1 code implementation • 18 Mar 2024 • Sha Zhang, Jiajun Deng, Lei Bai, Houqiang Li, Wanli Ouyang, Yanyong Zhang

We present a hybrid-view-based knowledge distillation framework, termed HVDistill, to guide the feature learning of a point cloud neural network with a pre-trained image network in an unsupervised man- ner.

Knowledge Distillation NER +1

Paper
Code

Agent3D-Zero: An Agent for Zero-shot 3D Understanding

no code implementations • 18 Mar 2024 • Sha Zhang, Di Huang, Jiajun Deng, Shixiang Tang, Wanli Ouyang, Tong He, Yanyong Zhang

The ability to understand and reason the 3D real world is a crucial milestone towards artificial general intelligence.

Language Modelling Scene Understanding

Paper
Add Code

PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest

no code implementations • 14 Mar 2024 • Jiajun Deng, Sha Zhang, Feras Dayoub, Wanli Ouyang, Yanyong Zhang, Ian Reid

In this work, we present PoIFusion, a simple yet effective multi-modal 3D object detection framework to fuse the information of RGB images and LiDAR point clouds at the point of interest (abbreviated as PoI).

3D Object Detection Object +1

Paper
Add Code

LOCR: Location-Guided Transformer for Optical Character Recognition

no code implementations • 4 Mar 2024 • Yu Sun, Dongzhan Zhou, Chen Lin, Conghui He, Wanli Ouyang, Han-sen Zhong

Academic documents are packed with texts, equations, tables, and figures, requiring comprehensive understanding for accurate Optical Character Recognition (OCR).

Marketing Optical Character Recognition +1

Paper
Add Code

Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised Semantic Segmentation

no code implementations • 2 Mar 2024 • Lian Xu, Mohammed Bennamoun, Farid Boussaid, Wanli Ouyang, Ferdous Sohel, Dan Xu

We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from these saliency maps and the significant inter-task correlation between saliency detection and semantic segmentation.

Auxiliary Learning Multi-Label Image Classification +5

Paper
Add Code

ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models

1 code implementation • 22 Feb 2024 • Yanan Wu, Jie Liu, Xingyuan Bu, Jiaheng Liu, Zhanhui Zhou, Yuanxing Zhang, Chenchen Zhang, Zhiqi Bai, Haibin Chen, Tiezheng Ge, Wanli Ouyang, Wenbo Su, Bo Zheng

This paper introduces ConceptMath, a bilingual (English and Chinese), fine-grained benchmark that evaluates concept-wise mathematical reasoning of Large Language Models (LLMs).

Math Mathematical Reasoning

Paper
Code

MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues

no code implementations • 22 Feb 2024 • Ge Bai, Jie Liu, Xingyuan Bu, Yancheng He, Jiaheng Liu, Zhanhui Zhou, Zhuoran Lin, Wenbo Su, Tiezheng Ge, Bo Zheng, Wanli Ouyang

By conducting a detailed analysis of real multi-turn dialogue data, we construct a three-tier hierarchical ability taxonomy comprising 4208 turns across 1388 multi-turn dialogues in 13 distinct tasks.

Paper
Add Code

NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth Supervision for Indoor Multi-View 3D Detection

1 code implementation • 22 Feb 2024 • Chenxi Huang, Yuenan Hou, Weicai Ye, Di Huang, Xiaoshui Huang, Binbin Lin, Deng Cai, Wanli Ouyang

We project the freely available 3D segmentation annotations onto the 2D plane and leverage the corresponding 2D semantic maps as the supervision signal, significantly enhancing the semantic awareness of multi-view detectors.

Depth Estimation Depth Prediction +1

Paper
Code

FiT: Flexible Vision Transformer for Diffusion Model

2 code implementations • 19 Feb 2024 • Zeyu Lu, Zidong Wang, Di Huang, Chengyue Wu, Xihui Liu, Wanli Ouyang, Lei Bai

Nature is infinitely resolution-free.

Image Cropping

321

Paper
Code

Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!

1 code implementation • 19 Feb 2024 • Zhanhui Zhou, Jie Liu, Zhichen Dong, Jiaheng Liu, Chao Yang, Wanli Ouyang, Yu Qiao

Large language models (LLMs) need to undergo safety alignment to ensure safe conversations with humans.

Language Modelling

Paper
Code

Self-consistent Validation for Machine Learning Electronic Structure

no code implementations • 15 Feb 2024 • Gengyuan Hu, Gengchen Wei, Zekun Lou, Philip H. S. Torr, Wanli Ouyang, Han-sen Zhong, Chen Lin

Machine learning has emerged as a significant approach to efficiently tackle electronic structure problems.

Active Learning

Paper
Add Code

Revealing Decurve Flows for Generalized Graph Propagation

no code implementations • 13 Feb 2024 • Chen Lin, Liheng Ma, Yiyang Chen, Wanli Ouyang, Michael M. Bronstein, Philip H. S. Torr

\textbf{Secondly}, we propose the {\em Continuous Unified Ricci Curvature} (\textbf{CURC}), an extension of celebrated {\em Ollivier-Ricci Curvature} for directed and weighted graphs.

Graph Learning

Paper
Add Code

ChemLLM: A Chemical Large Language Model

no code implementations • 10 Feb 2024 • Di Zhang, Wei Liu, Qian Tan, Jingdan Chen, Hang Yan, Yuliang Yan, Jiatong Li, Weiran Huang, Xiangyu Yue, Dongzhan Zhou, Shufei Zhang, Mao Su, Hansen Zhong, Yuqiang Li, Wanli Ouyang

ChemLLM beats GPT-3. 5 on all three principal tasks in chemistry, i. e., name conversion, molecular caption, and reaction prediction, and surpasses GPT-4 on two of them.

Language Modelling Large Language Model +2

Paper
Add Code

CasCast: Skillful High-resolution Precipitation Nowcasting via Cascaded Modelling

no code implementations • 6 Feb 2024 • Junchao Gong, Lei Bai, Peng Ye, Wanghan Xu, Na Liu, Jianhua Dai, Xiaokang Yang, Wanli Ouyang

Precipitation nowcasting based on radar data plays a crucial role in extreme weather prediction and has broad implications for disaster management.

Management

Paper
Add Code

Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning

no code implementations • 4 Feb 2024 • Haoyi Zhu, Yating Wang, Di Huang, Weicai Ye, Wanli Ouyang, Tong He

In this study, we explore the influence of different observation spaces on robot learning, focusing on three predominant modalities: RGB, RGB-D, and point cloud.

Zero-shot Generalization

Paper
Add Code

Integration of cognitive tasks into artificial general intelligence test for large models

no code implementations • 4 Feb 2024 • Youzhi Qu, Chen Wei, Penghui Du, Wenxin Che, Chi Zhang, Wanli Ouyang, Yatao Bian, Feiyang Xu, Bin Hu, Kai Du, Haiyan Wu, Jia Liu, Quanying Liu

During the evolution of large models, performance evaluation is necessarily performed to assess their capabilities and ensure safety before practical application.

Paper
Add Code

A Comprehensive Survey on 3D Content Generation

1 code implementation • 2 Feb 2024 • Jian Liu, Xiaoshui Huang, Tianyu Huang, Lu Chen, Yuenan Hou, Shixiang Tang, Ziwei Liu, Wanli Ouyang, WangMeng Zuo, Junjun Jiang, Xianming Liu

Recent years have witnessed remarkable advances in artificial intelligence generated content(AIGC), with diverse input modalities, e. g., text, image, video, audio and 3D.

360

Paper
Code

ExtremeCast: Boosting Extreme Value Prediction for Global Weather Forecast

1 code implementation • 2 Feb 2024 • Wanghan Xu, Kang Chen, Tao Han, Hao Chen, Wanli Ouyang, Lei Bai

Data-driven weather forecast based on machine learning (ML) has experienced rapid development and demonstrated superior performance in the global medium-range forecast compared to traditional physics-based dynamical models.

Value prediction

Paper
Code

FengWu-GHR: Learning the Kilometer-scale Medium-range Global Weather Forecasting

no code implementations • 28 Jan 2024 • Tao Han, Song Guo, Fenghua Ling, Kang Chen, Junchao Gong, Jingjia Luo, Junxia Gu, Kan Dai, Wanli Ouyang, Lei Bai

Therefore, building a kilometer-scale global forecast model is a persistent pursuit in the meteorology domain.

Weather Forecasting

Paper
Add Code

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

no code implementations • 26 Jan 2024 • Chaochao Lu, Chen Qian, Guodong Zheng, Hongxing Fan, Hongzhi Gao, Jie Zhang, Jing Shao, Jingyi Deng, Jinlan Fu, Kexin Huang, Kunchang Li, Lijun Li, LiMin Wang, Lu Sheng, Meiqi Chen, Ming Zhang, Qibing Ren, Sirui Chen, Tao Gui, Wanli Ouyang, Yali Wang, Yan Teng, Yaru Wang, Yi Wang, Yinan He, Yingchun Wang, Yixu Wang, Yongting Zhang, Yu Qiao, Yujiong Shen, Yurong Mou, Yuxi Chen, Zaibin Zhang, Zhelun Shi, Zhenfei Yin, Zhipin Wang

Multi-modal Large Language Models (MLLMs) have shown impressive abilities in generating reasonable responses with respect to multi-modal contents.

Paper
Add Code

Observation-Guided Meteorological Field Downscaling at Station Scale: A Benchmark and a New Method

no code implementations • 22 Jan 2024 • Zili Liu, Hao Chen, Lei Bai, Wenyuan Li, Keyan Chen, Zhengyi Wang, Wanli Ouyang, Zhengxia Zou, Zhenwei Shi

In this paper, we extend meteorological downscaling to arbitrary scattered station scales, establish a brand new benchmark and dataset, and retrieve meteorological states at any given station location from a coarse-resolution meteorological field.

Super-Resolution Weather Forecasting

Paper
Add Code

Bilateral Reference for High-Resolution Dichotomous Image Segmentation

1 code implementation • 7 Jan 2024 • Peng Zheng, Dehong Gao, Deng-Ping Fan, Li Liu, Jorma Laaksonen, Wanli Ouyang, Nicu Sebe

It comprises two essential components: the localization module (LM) and the reconstruction module (RM) with our proposed bilateral reference (BiRef).

Ranked #1 on RGB Salient Object Detection on HRSOD (using extra training data)

Camouflaged Object Segmentation Dichotomous Image Segmentation +3

147

Paper
Code

Merging Vision Transformers from Different Tasks and Domains

no code implementations • 25 Dec 2023 • Peng Ye, Chenyu Huang, Mingzhu Shen, Tao Chen, Yongqi Huang, Yuning Zhang, Wanli Ouyang

This work targets to merge various Vision Transformers (ViTs) trained on different tasks (i. e., datasets with different object categories) or domains (i. e., datasets with the same categories but different environments) into one unified model, yielding still good performance on each task or domain.

Paper
Add Code

Partial Fine-Tuning: A Successor to Full Fine-Tuning for Vision Transformers

no code implementations • 25 Dec 2023 • Peng Ye, Yongqi Huang, Chongjun Tu, Minglei Li, Tao Chen, Tong He, Wanli Ouyang

We first validate eight manually-defined partial fine-tuning strategies across kinds of datasets and vision transformer architectures, and find that some partial fine-tuning strategies (e. g., ffn only or attention only) can achieve better performance with fewer tuned parameters than full fine-tuning, and selecting appropriate layers is critical to partial fine-tuning.

Paper
Add Code

Efficient Architecture Search via Bi-level Data Pruning

no code implementations • 21 Dec 2023 • Chongjun Tu, Peng Ye, Weihao Lin, Hancheng Ye, Chong Yu, Tao Chen, Baopu Li, Wanli Ouyang

Improving the efficiency of Neural Architecture Search (NAS) is a challenging but significant task that has received much attention.

Neural Architecture Search

Paper
Add Code

Towards an end-to-end artificial intelligence driven global weather forecasting system

no code implementations • 18 Dec 2023 • Kun Chen, Lei Bai, Fenghua Ling, Peng Ye, Tao Chen, Jing-Jia Luo, Hao Chen, Yi Xiao, Kang Chen, Tao Han, Wanli Ouyang

Initial states are typically generated by traditional data assimilation components, which are computational expensive and time-consuming.

Weather Forecasting

Paper
Add Code

ContraNovo: A Contrastive Learning Approach to Enhance De Novo Peptide Sequencing

1 code implementation • 18 Dec 2023 • Zhi Jin, Sheng Xu, Xiang Zhang, Tianze Ling, Nanqing Dong, Wanli Ouyang, Zhiqiang Gao, Cheng Chang, Siqi Sun

De novo peptide sequencing from mass spectrometry (MS) data is a critical task in proteomics research.

Contrastive Learning de novo peptide sequencing

Paper
Code

ResoNet: Robust and Explainable ENSO Forecasts with Hybrid Convolution and Transformer Networks

no code implementations • 16 Dec 2023 • Pumeng Lyu, Tao Tang, Fenghua Ling, Jing-Jia Luo, Niklas Boers, Wanli Ouyang, Lei Bai

Recent studies have shown that deep learning (DL) models can skillfully predict the El Ni\~no-Southern Oscillation (ENSO) forecasts over 1. 5 years ahead.

Paper
Add Code

FengWu-4DVar: Coupling the Data-driven Weather Forecasting Model with 4D Variational Assimilation

no code implementations • 16 Dec 2023 • Yi Xiao, Lei Bai, Wei Xue, Kang Chen, Tao Han, Wanli Ouyang

Weather forecasting is a crucial yet highly challenging task.

Weather Forecasting

Paper
Add Code

Point Transformer V3: Simpler, Faster, Stronger

3 code implementations • 15 Dec 2023 • Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xihui Liu, Yu Qiao, Wanli Ouyang, Tong He, Hengshuang Zhao

This paper is not motivated to seek innovation within the attention mechanism.

Ranked #1 on Semantic Segmentation on S3DIS (using extra training data)

3D Semantic Segmentation LIDAR Semantic Segmentation +1

1,990

Paper
Code

UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation

no code implementations • 14 Dec 2023 • Zexiang Liu, Yangguang Li, Youtian Lin, Xin Yu, Sida Peng, Yan-Pei Cao, Xiaojuan Qi, Xiaoshui Huang, Ding Liang, Wanli Ouyang

Recent advancements in text-to-3D generation technology have significantly advanced the conversion of textual descriptions into imaginative well-geometrical and finely textured 3D objects.

3D Generation Text to 3D

Paper
Add Code

A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning

1 code implementation • 12 Dec 2023 • Yinmin Zhang, Jie Liu, Chuming Li, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang

In this paper, from a novel perspective, we systematically study the challenges that remain in O2O RL and identify that the reason behind the slow improvement of the performance and the instability of online finetuning lies in the inaccurate Q-value estimation inherited from offline pretraining.

Offline RL

2,513

Paper
Code

Hulk: A Universal Knowledge Translator for Human-Centric Tasks

2 code implementations • 4 Dec 2023 • Yizhou Wang, Yixuan Wu, Shixiang Tang, Weizhen He, Xun Guo, Feng Zhu, Lei Bai, Rui Zhao, Jian Wu, Tong He, Wanli Ouyang

Human-centric perception tasks, e. g., pedestrian detection, skeleton-based action recognition, and pose estimation, have wide industrial applications, such as metaverse and sports analysis.

Ranked #1 on Pedestrian Image Caption on CUHK-PEDES

3D Human Pose Estimation Action Recognition +8

204

Paper
Code

GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?

2 code implementations • 27 Nov 2023 • Wenhao Wu, Huanjin Yao, Mengxi Zhang, Yuxin Song, Wanli Ouyang, Jingdong Wang

Our study centers on the evaluation of GPT-4's linguistic and visual capabilities in zero-shot visual recognition tasks: Firstly, we explore the potential of its generated rich textual descriptions across various categories to enhance recognition performance without any training.

Zero-Shot Learning

831

Paper
Code

Point Cloud Pre-training with Diffusion Models

no code implementations • 25 Nov 2023 • Xiao Zheng, Xiaoshui Huang, Guofeng Mei, Yuenan Hou, Zhaoyang Lyu, Bo Dai, Wanli Ouyang, Yongshun Gong

This generator aggregates the features extracted by the backbone and employs them as the condition to guide the point-to-point recovery from the noisy point cloud, thereby assisting the backbone in capturing both local and global geometric priors as well as the global point density distribution of the object.

Point Cloud Pre-training

Paper
Add Code

Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE

1 code implementation • 5 Nov 2023 • Zeren Chen, Ziqin Wang, Zhen Wang, Huayang Liu, Zhenfei Yin, Si Liu, Lu Sheng, Wanli Ouyang, Yu Qiao, Jing Shao

While this phenomenon has been overlooked in previous work, we propose a novel and extensible framework, called Octavius, for comprehensive studies and experimentation on multimodal learning with Multimodal Large Language Models (MLLMs).

Zero-shot Generalization

259

Paper
Code

GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection

1 code implementation • 24 Oct 2023 • Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Tong He, Yonghui Li, Wanli Ouyang

It models the uncertainty propagation relationship of the geometry projection during training, improving the stability and efficiency of the end-to-end model learning.

Monocular 3D Object Detection object-detection

126

Paper
Code

I$^2$MD: 3D Action Representation Learning with Inter- and Intra-modal Mutual Distillation

no code implementations • 24 Oct 2023 • Yunyao Mao, Jiajun Deng, Wengang Zhou, Zhenbo Lu, Wanli Ouyang, Houqiang Li

Different from existing distillation solutions that transfer the knowledge of a pre-trained and fixed teacher to the student, in CMD, the knowledge is continuously updated and bidirectionally distilled between modalities during pre-training.

Contrastive Learning Representation Learning

Paper
Add Code

MaskMA: Towards Zero-Shot Multi-Agent Decision Making with Mask-Based Collaborative Learning

no code implementations • 18 Oct 2023 • Jie Liu, Yinmin Zhang, Chuming Li, Chao Yang, Yaodong Yang, Yu Liu, Wanli Ouyang

Building a single generalist agent with strong zero-shot capability has recently sparked significant advancements.

Decision Making SMAC+

Paper
Add Code

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

1 code implementation • 12 Oct 2023 • Haoyi Zhu, Honghui Yang, Xiaoyang Wu, Di Huang, Sha Zhang, Xianglong He, Hengshuang Zhao, Chunhua Shen, Yu Qiao, Tong He, Wanli Ouyang

In this paper, we introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation, thereby establishing a pathway to 3D foundational models.

Ranked #1 on 3D Semantic Segmentation on ScanNet++ (using extra training data)

3D Object Detection 3D Reconstruction +5

294

Paper
Code

UniPAD: A Universal Pre-training Paradigm for Autonomous Driving

1 code implementation • 12 Oct 2023 • Honghui Yang, Sha Zhang, Di Huang, Xiaoyang Wu, Haoyi Zhu, Tong He, Shixiang Tang, Hengshuang Zhao, Qibo Qiu, Binbin Lin, Xiaofei He, Wanli Ouyang

In the context of autonomous driving, the significance of effective feature learning is widely acknowledged.

3D Object Detection 3D Semantic Segmentation +3

122

Paper
Code

Rethinking the BERT-like Pretraining for DNA Sequences

no code implementations • 11 Oct 2023 • Chaoqi Liang, Weiqiang Bai, Lifeng Qiao, Yuchen Ren, Jianle Sun, Peng Ye, Hongliang Yan, Xinzhu Ma, WangMeng Zuo, Wanli Ouyang

To address this research gap, we first conducted a series of exploratory experiments and gained several insightful observations: 1) In the fine-tuning phase of downstream tasks, when using K-mer overlapping tokenization instead of K-mer non-overlapping tokenization, both overlapping and non-overlapping pretraining weights show consistent performance improvement. 2) During the pre-training process, using K-mer overlapping tokenization quickly produces clear K-mer embeddings and reduces the loss to a very low level, while using K-mer non-overlapping tokenization results in less distinct embeddings and continuously decreases the loss.

Paper
Add Code

Towards Fair and Comprehensive Comparisons for Image-Based 3D Object Detection

no code implementations • ICCV 2023 • Xinzhu Ma, Yongtao Wang, Yinmin Zhang, Zhiyi Xia, Yuan Meng, Zhihui Wang, Haojie Li, Wanli Ouyang

In this work, we build a modular-designed codebase, formulate strong training recipes, design an error diagnosis toolbox, and discuss current methods for image-based 3D object detection.

3D Object Detection Object +1

Paper
Add Code

Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization

1 code implementation • 5 Oct 2023 • Zhanhui Zhou, Jie Liu, Chao Yang, Jing Shao, Yu Liu, Xiangyu Yue, Wanli Ouyang, Yu Qiao

A single language model (LM), despite aligning well with an average labeler through reinforcement learning from human feedback (RLHF), may not universally suit diverse human preferences.

Language Modelling Long Form Question Answering

Paper
Code

Understanding Masked Autoencoders From a Local Contrastive Perspective

no code implementations • 3 Oct 2023 • Xiaoyu Yue, Lei Bai, Meng Wei, Jiangmiao Pang, Xihui Liu, Luping Zhou, Wanli Ouyang

Masked AutoEncoder (MAE) has revolutionized the field of self-supervised learning with its simple yet effective masking and reconstruction strategies.

Contrastive Learning Data Augmentation +1

Paper
Add Code

RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models

1 code implementation • 1 Oct 2023 • Zekun Moore Wang, Zhongyuan Peng, Haoran Que, Jiaheng Liu, Wangchunshu Zhou, Yuhan Wu, Hongcheng Guo, Ruitong Gan, Zehao Ni, Man Zhang, Zhaoxiang Zhang, Wanli Ouyang, Ke Xu, Wenhu Chen, Jie Fu, Junran Peng

The advent of Large Language Models (LLMs) has paved the way for complex tasks such as role-playing, which enhances user interactions by enabling models to imitate various characters.

Benchmarking

383

Paper
Code

NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space

1 code implementation • ICCV 2023 • Jiawei Yao, Chuming Li, Keqiang Sun, Yingjie Cai, Hao Li, Wanli Ouyang, Hongsheng Li

Monocular 3D Semantic Scene Completion (SSC) has garnered significant attention in recent years due to its potential to predict complex semantics and geometry shapes from a single image, requiring no 3D inputs.

Ranked #1 on 3D Semantic Scene Completion from a single RGB image on NYUv2

3D Semantic Scene Completion from a single 2D image 3D Semantic Scene Completion from a single RGB image

Paper
Code

Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites: A Federated Learning Approach with Noise-Resilient Training

no code implementations • 31 Aug 2023 • Lei Bai, Dongang Wang, Michael Barnett, Mariano Cabezas, Weidong Cai, Fernando Calamante, Kain Kyle, Dongnan Liu, Linda Ly, Aria Nguyen, Chun-Chien Shieh, Ryan Sullivan, Hengrui Wang, Geng Zhan, Wanli Ouyang, Chenyu Wang

Our approach enables collaboration among multiple clinical sites without compromising data privacy under a federated learning paradigm that incorporates a noise-robust training strategy based on label correction.

Federated Learning Lesion Segmentation

Paper
Add Code

DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior

1 code implementation • 29 Aug 2023 • Xinqi Lin, Jingwen He, Ziyan Chen, Zhaoyang Lyu, Bo Dai, Fanghua Yu, Wanli Ouyang, Yu Qiao, Chao Dong

We present DiffBIR, a general restoration pipeline that could handle different blind image restoration tasks in a unified framework.

Ranked #1 on Blind Face Restoration on LFW

Blind Face Restoration Image Denoising +2

2,986

Paper
Code

Boosting Residual Networks with Group Knowledge

1 code implementation • 26 Aug 2023 • Shengji Tang, Peng Ye, Baopu Li, Weihao Lin, Tao Chen, Tong He, Chong Yu, Wanli Ouyang

Specifically, we implicitly divide all subnets into hierarchical groups by subnet-in-subnet sampling, aggregate the knowledge of different subnets in each group during training, and exploit upper-level group knowledge to supervise lower-level subnet groups.

Knowledge Distillation

Paper
Code

STEERER: Resolving Scale Variations for Counting and Localization via Selective Inheritance Learning

1 code implementation • ICCV 2023 • Tao Han, Lei Bai, Lingbo Liu, Wanli Ouyang

Scale variation is a deep-rooted problem in object counting, which has not been effectively addressed by existing scale-aware algorithms.

feature selection Object Counting

Paper
Code

Masked Motion Predictors are Strong 3D Action Representation Learners

1 code implementation • ICCV 2023 • Yunyao Mao, Jiajun Deng, Wengang Zhou, Yao Fang, Wanli Ouyang, Houqiang Li

To be specific, the proposed MAMP takes as input the masked spatio-temporal skeleton sequence and predicts the corresponding temporal motion of the masked human joints.

Ranked #5 on Skeleton Based Action Recognition on NTU RGB+D 120

motion prediction Skeleton Based Action Recognition

Paper
Code

Experts Weights Averaging: A New General Training Scheme for Vision Transformers

no code implementations • 11 Aug 2023 • Yongqi Huang, Peng Ye, Xiaoshui Huang, Sheng Li, Tao Chen, Tong He, Wanli Ouyang

As Vision Transformers (ViTs) are gradually surpassing CNNs in various visual tasks, one may question: if a training scheme specifically for ViTs exists that can also achieve performance improvement without increasing inference cost?

Paper
Add Code

MCTformer+: Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation

1 code implementation • 6 Aug 2023 • Lian Xu, Mohammed Bennamoun, Farid Boussaid, Hamid Laga, Wanli Ouyang, Dan Xu

Building upon the observation that the attended regions of the one-class token in the standard vision transformer can contribute to a class-agnostic localization map, we explore the potential of the transformer model to capture class-specific attention for class-discriminative object localization by learning multiple class tokens.

Object Localization Weakly supervised Semantic Segmentation +1

139

Paper
Code

Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning

no code implementations • 24 Jul 2023 • Chuming Li, Ruonan Jia, Jie Liu, Yinmin Zhang, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang

Model-based reinforcement learning (RL) has demonstrated remarkable successes on a range of continuous control tasks due to its high sample efficiency.

Continuous Control Model-based Reinforcement Learning +1

Paper
Add Code

A general Temperature-Guided Language model to engineer enhanced Stability and Activity in Proteins

no code implementations • 24 Jul 2023 • Pan Tan, Mingchen Li, Yuanxi Yu, Fan Jiang, Lirong Zheng, Banghao Wu, Xinyu Sun, Liqi Kang, Jie Song, Liang Zhang, Yi Xiong, Wanli Ouyang, Zhiqiang Hu, Guisheng Fan, Yufeng Pei, Liang Hong

Designing protein mutants with high stability and activity is a critical yet challenging task in protein engineering.

Language Modelling

Paper
Add Code

Meta-Transformer: A Unified Framework for Multimodal Learning

1 code implementation • 20 Jul 2023 • Yiyuan Zhang, Kaixiong Gong, Kaipeng Zhang, Hongsheng Li, Yu Qiao, Wanli Ouyang, Xiangyu Yue

Multimodal learning aims to build models that can process and relate information from multiple modalities.

Time Series

1,437

Paper
Code

What Can Simple Arithmetic Operations Do for Temporal Modeling?

2 code implementations • ICCV 2023 • Wenhao Wu, Yuxin Song, Zhun Sun, Jingdong Wang, Chang Xu, Wanli Ouyang

We conduct comprehensive ablation studies on the instantiation of ATMs and demonstrate that this module provides powerful temporal modeling capability at a low computational cost.

Ranked #4 on Action Recognition on Something-Something V1

Action Classification Action Recognition +1

Paper
Code

UniG3D: A Unified 3D Object Generation Dataset

no code implementations • 19 Jun 2023 • Qinghong Sun, Yangguang Li, Zexiang Liu, Xiaoshui Huang, Fenggang Liu, Xihui Liu, Wanli Ouyang, Jing Shao

However, the quality and diversity of existing 3D object generation methods are constrained by the inadequacies of existing 3D object datasets, including issues related to text quality, the incompleteness of multi-modal data representation encompassing 2D rendered images and 3D assets, as well as the size of the dataset.

Autonomous Driving Object

Paper
Add Code

MotionGPT: Finetuned LLMs Are General-Purpose Motion Generators

no code implementations • 19 Jun 2023 • Yaqi Zhang, Di Huang, Bin Liu, Shixiang Tang, Yan Lu, Lu Chen, Lei Bai, Qi Chu, Nenghai Yu, Wanli Ouyang

Generating realistic human motion from given action descriptions has experienced significant advancements because of the emerging requirement of digital humans.

Paper
Add Code

Adaptive Hierarchical SpatioTemporal Network for Traffic Forecasting

no code implementations • 15 Jun 2023 • YiRong Chen, Ziyue Li, Wanli Ouyang, Michael Lepech

In this work, we propose an Adaptive Hierarchical SpatioTemporal Network (AHSTN) to promote traffic forecasting by exploiting the spatial hierarchy and modeling multi-scale spatial correlations.

Paper
Add Code

Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions

1 code implementation • 13 Jun 2023 • Weizhen He, Yiheng Deng, Shixiang Tang, Qihao Chen, Qingsong Xie, Yizhou Wang, Lei Bai, Feng Zhu, Rui Zhao, Wanli Ouyang, Donglian Qi, Yunfeng Yan

This paper strives to resolve this problem by proposing a new instruct-ReID task that requires the model to retrieve images according to the given image or language instructions.

Person Re-Identification

Paper
Code

LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark

1 code implementation • NeurIPS 2023 • Zhenfei Yin, Jiong Wang, JianJian Cao, Zhelun Shi, Dingning Liu, Mukai Li, Lu Sheng, Lei Bai, Xiaoshui Huang, Zhiyong Wang, Jing Shao, Wanli Ouyang

To the best of our knowledge, we present one of the very first open-source endeavors in the field, LAMM, encompassing a Language-Assisted Multi-Modal instruction tuning dataset, framework, and benchmark.

259

Paper
Code

Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection

1 code implementation • CVPR 2023 • Yingjie Wang, Jiajun Deng, Yao Li, Jinshui Hu, Cong Liu, Yu Zhang, Jianmin Ji, Wanli Ouyang, Yanyong Zhang

LiDAR and Radar are two complementary sensing approaches in that LiDAR specializes in capturing an object's 3D shape while Radar provides longer detection ranges as well as velocity hints.

object-detection Object Detection

Paper
Code

PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer

1 code implementation • CVPR 2023 • Honghui Yang, Wenxiao Wang, Minghao Chen, Binbin Lin, Tong He, Hua Chen, Xiaofei He, Wanli Ouyang

The key to associating the two different representations is our introduced input-dependent Query Initialization module, which could efficiently generate reference points and content queries.

Autonomous Driving Quantization

Paper
Code

Clothes-Invariant Feature Learning by Causal Intervention for Clothes-Changing Person Re-identification

no code implementations • 10 May 2023 • Xulin Li, Yan Lu, Bin Liu, Yuenan Hou, Yating Liu, Qi Chu, Wanli Ouyang, Nenghai Yu

Clothes-invariant feature extraction is critical to the clothes-changing person re-identification (CC-ReID).

Clothes Changing Person Re-Identification

Paper
Add Code

Stimulative Training++: Go Beyond The Performance Limits of Residual Networks

no code implementations • 4 May 2023 • Peng Ye, Tong He, Shengji Tang, Baopu Li, Tao Chen, Lei Bai, Wanli Ouyang

In this work, we aim to re-investigate the training process of residual networks from a novel social psychology perspective of loafing, and further propose a new training scheme as well as three improved strategies for boosting residual networks beyond their performance limits.

Paper
Add Code

FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead

1 code implementation • 6 Apr 2023 • Kang Chen, Tao Han, Junchao Gong, Lei Bai, Fenghua Ling, Jing-Jia Luo, Xi Chen, Leiming Ma, Tianning Zhang, Rui Su, Yuanzheng Ci, Bin Li, Xiaokang Yang, Wanli Ouyang

We present FengWu, an advanced data-driven global medium-range weather forecast system based on Artificial Intelligence (AI).

Paper
Code

Automatically Predict Material Properties with Microscopic Image Example Polymer Compatibility

no code implementations • 22 Mar 2023 • Zhilong Liang, Zhenzhi Tan, Ruixin Hong, Wanli Ouyang, Jinying Yuan, ChangShui Zhang

Computer image recognition with machine learning method can make up the defects of artificial judging, giving accurate and quantitative judgement.

Transfer Learning

Paper
Add Code

HumanBench: Towards General Human-centric Perception with Projector Assisted Pretraining

1 code implementation • CVPR 2023 • Shixiang Tang, Cheng Chen, Qingsong Xie, Meilin Chen, Yizhou Wang, Yuanzheng Ci, Lei Bai, Feng Zhu, Haiyang Yang, Li Yi, Rui Zhao, Wanli Ouyang

Specifically, we propose a \textbf{HumanBench} based on existing datasets to comprehensively evaluate on the common ground the generalization abilities of different pretraining methods on 19 datasets from 6 diverse downstream tasks, including person ReID, pose estimation, human parsing, pedestrian attribute recognition, pedestrian detection, and crowd counting.

Ranked #1 on Pedestrian Attribute Recognition on PA-100K (using extra training data)

Attribute Autonomous Driving +5

204

Paper
Code

UniHCP: A Unified Model for Human-Centric Perceptions

1 code implementation • CVPR 2023 • Yuanzheng Ci, Yizhou Wang, Meilin Chen, Shixiang Tang, Lei Bai, Feng Zhu, Rui Zhao, Fengwei Yu, Donglian Qi, Wanli Ouyang

When adapted to a specific task, UniHCP achieves new SOTAs on a wide range of human-centric tasks, e. g., 69. 8 mIoU on CIHP for human parsing, 86. 18 mA on PA-100K for attribute prediction, 90. 3 mAP on Market1501 for ReID, and 85. 8 JI on CrowdHuman for pedestrian detection, performing better than specialized models tailored for each task.

Ranked #1 on Pose Estimation on MS-COCO

2D Pose Estimation Attribute +8

135

Paper
Code

Multi-Scale Control Signal-Aware Transformer for Motion Synthesis without Phase

no code implementations • 3 Mar 2023 • Lintao Wang, Kun Hu, Lei Bai, Yu Ding, Wanli Ouyang, Zhiyong Wang

As past poses often contain useful auxiliary hints, in this paper, we propose a task-agnostic deep learning method, namely Multi-scale Control Signal-aware Transformer (MCS-T), with an attention based encoder-decoder architecture to discover the auxiliary information implicitly for synthesizing controllable motion without explicitly requiring auxiliary information such as phase.

Feature Engineering Motion Synthesis

Paper
Add Code

Saliency Guided Contrastive Learning on Scene Images

no code implementations • 22 Feb 2023 • Meilin Chen, Yizhou Wang, Shixiang Tang, Feng Zhu, Haiyang Yang, Lei Bai, Rui Zhao, Donglian Qi, Wanli Ouyang

Despite being feasible, recent works largely overlooked discovering the most discriminative regions for contrastive learning to object representations in scene images.

Contrastive Learning Representation Learning +1

Paper
Add Code

Learning from pseudo-labels: deep networks improve consistency in longitudinal brain volume estimation

no code implementations • 8 Feb 2023 • Geng Zhan, Dongang Wang, Mariano Cabezas, Lei Bai, Kain Kyle, Wanli Ouyang, Michael Barnett, Chenyu Wang

An accurate and robust quantitative measurement of brain volume change is paramount for translational research and clinical applications.

Paper
Add Code

Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline

1 code implementation • 29 Jan 2023 • Yangguang Li, Bin Huang, Zeren Chen, Yufeng Cui, Feng Liang, Mingzhu Shen, Fenggang Liu, Enze Xie, Lu Sheng, Wanli Ouyang, Jing Shao

Our Fast-BEV consists of five parts, We novelly propose (1) a lightweight deployment-friendly view transformation which fast transfers 2D image feature to 3D voxel space, (2) an multi-scale image encoder which leverages multi-scale information for better performance, (3) an efficient BEV encoder which is particularly designed to speed up on-vehicle inference.

Data Augmentation

525

Paper
Code

$β$-DARTS++: Bi-level Regularization for Proxy-robust Differentiable Architecture Search

1 code implementation • 16 Jan 2023 • Peng Ye, Tong He, Baopu Li, Tao Chen, Lei Bai, Wanli Ouyang

To address the robustness problem, we first benchmark different NAS methods under a wide range of proxy data, proxy channels, proxy layers and proxy epochs, since the robustness of NAS under different kinds of proxies has not been explored before.

Neural Architecture Search

Paper
Code

Open-Set Fine-Grained Retrieval via Prompting Vision-Language Evaluator

no code implementations • CVPR 2023 • Shijie Wang, Jianlong Chang, Haojie Li, Zhihui Wang, Wanli Ouyang, Qi Tian

PLEor could leverage pre-trained CLIP model to infer the discrepancies encompassing both pre-defined and unknown subcategories, called category-specific discrepancies, and transfer them to the backbone network trained in the close-set scenarios.

Knowledge Distillation Retrieval +1

Paper
Add Code

Semi-Supervised Semantic Segmentation under Label Noise via Diverse Learning Groups

no code implementations • ICCV 2023 • Peixia Li, Pulak Purkait, Thalaiyasingam Ajanthan, Majid Abdolshah, Ravi Garg, Hisham Husain, Chenchen Xu, Stephen Gould, Wanli Ouyang, Anton Van Den Hengel

Each learning group consists of a teacher network, a student network and a novel filter module.

Semi-Supervised Semantic Segmentation

Paper
Add Code

Crossing the Gap: Domain Generalization for Image Captioning

no code implementations • CVPR 2023 • Yuchen Ren, Zhendong Mao, Shancheng Fang, Yan Lu, Tong He, Hao Du, Yongdong Zhang, Wanli Ouyang

In this paper, we introduce a new setting called Domain Generalization for Image Captioning (DGIC), where the data from the target domain is unseen in the learning process.

Domain Generalization Image Captioning +1

Paper
Add Code

Learning Multi-Modal Class-Specific Tokens for Weakly Supervised Dense Object Localization

no code implementations • CVPR 2023 • Lian Xu, Wanli Ouyang, Mohammed Bennamoun, Farid Boussaid, Dan Xu

Weakly supervised dense object localization (WSDOL) relies generally on Class Activation Mapping (CAM), which exploits the correlation between the class weights of the image classifier and the pixel-level features.

Object Localization Representation Learning +2

Paper
Add Code

Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?

4 code implementations • CVPR 2023 • Wenhao Wu, Haipeng Luo, Bo Fang, Jingdong Wang, Wanli Ouyang

Most existing text-video retrieval methods focus on cross-modal matching between the visual content of videos and textual query sentences.

Ranked #7 on Video Retrieval on VATEX

Data Augmentation Retrieval +2

200

Paper
Code

Ponder: Point Cloud Pre-training via Neural Rendering

no code implementations • ICCV 2023 • Di Huang, Sida Peng, Tong He, Honghui Yang, Xiaowei Zhou, Wanli Ouyang

We propose a novel approach to self-supervised learning of point cloud representations by differentiable neural rendering.

3D Reconstruction Image Generation +2

Paper
Add Code

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models

5 code implementations • CVPR 2023 • Wenhao Wu, Xiaohan Wang, Haipeng Luo, Jingdong Wang, Yi Yang, Wanli Ouyang

In this paper, we propose a novel framework called BIKE, which utilizes the cross-modal bridge to explore bidirectional knowledge: i) We introduce the Video Attribute Association mechanism, which leverages the Video-to-Text knowledge to generate textual auxiliary attributes for complementing video recognition.

Ranked #1 on Zero-Shot Action Recognition on ActivityNet

Action Classification Action Recognition +3

200

Paper
Code

MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with Informative-Preserved Reconstruction and Self-Distilled Consistency

no code implementations • CVPR 2023 • Mingye Xu, Mutian Xu, Tong He, Wanli Ouyang, Yali Wang, Xiaoguang Han, Yu Qiao

Besides, such scenes with progressive masking ratios can also serve to self-distill their intrinsic spatial consistency, requiring to learn the consistent representations from unmasked areas.

object-detection Object Detection +2

Paper
Add Code

3D Point Cloud Pre-training with Knowledge Distillation from 2D Images

no code implementations • 17 Dec 2022 • Yuan YAO, Yuanhan Zhang, Zhenfei Yin, Jiebo Luo, Wanli Ouyang, Xiaoshui Huang

The recent success of pre-trained 2D vision models is mostly attributable to learning from large-scale datasets.

Concept Alignment Knowledge Distillation +6

Paper
Add Code

EPCL: Frozen CLIP Transformer is An Efficient Point Cloud Encoder

2 code implementations • 8 Dec 2022 • Xiaoshui Huang, Zhou Huang, Sheng Li, Wentao Qu, Tong He, Yuenan Hou, Yifan Zuo, Wanli Ouyang

These token embeddings are concatenated with a task token and fed into the frozen CLIP transformer to learn point cloud representation.

Few-Shot Learning Segmentation +1

259

Paper
Code

GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds

1 code implementation • CVPR 2023 • Honghui Yang, Tong He, Jiaheng Liu, Hua Chen, Boxi Wu, Binbin Lin, Xiaofei He, Wanli Ouyang

In contrast to previous 3D MAE frameworks, which either design a complex decoder to infer masked information from maintained regions or adopt sophisticated masking strategies, we instead propose a much simpler paradigm.

102

Paper
Code

Reconstructing Hand-Held Objects from Monocular Video

no code implementations • 30 Nov 2022 • Di Huang, Xiaopeng Ji, Xingyi He, Jiaming Sun, Tong He, Qing Shuai, Wanli Ouyang, Xiaowei Zhou

The key idea is that the hand motion naturally provides multiple views of the object and the motion can be reliably estimated by a hand pose tracker.

Hand Pose Estimation Object

Paper
Add Code

ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency

1 code implementation • 29 Nov 2022 • Chuming Li, Jie Liu, Yinmin Zhang, Yuhong Wei, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang

In the learning phase, each agent minimizes the TD error that is dependent on how the subsequent agents have reacted to their chosen action.

Ranked #1 on SMAC on SMAC 3s5z_vs_3s6z

Decision Making Q-Learning +2

176

Paper
Code

3D-QueryIS: A Query-based Framework for 3D Instance Segmentation

no code implementations • 17 Nov 2022 • Jiaheng Liu, Tong He, Honghui Yang, Rui Su, Jiayi Tian, Junran Wu, Hongcheng Guo, Ke Xu, Wanli Ouyang

Previous top-performing methods for 3D instance segmentation often maintain inter-task dependencies and the tendency towards a lack of robustness.

3D Instance Segmentation Segmentation +1

Paper
Add Code

Boosting Semi-Supervised 3D Object Detection with Semi-Sampling

no code implementations • 14 Nov 2022 • Xiaopei Wu, Yang Zhao, Liang Peng, Hua Chen, Xiaoshui Huang, Binbin Lin, Haifeng Liu, Deng Cai, Wanli Ouyang

When training a teacher-student semi-supervised framework, we randomly select gt samples and pseudo samples to both labeled frames and unlabeled frames, making a strong data augmentation for them.

3D Object Detection Data Augmentation +2

Paper
Add Code

The Equalization Losses: Gradient-Driven Training for Long-tailed Object Recognition

1 code implementation • 11 Oct 2022 • Jingru Tan, Bo Li, Xin Lu, Yongqiang Yao, Fengwei Yu, Tong He, Wanli Ouyang

Long-tail distribution is widely spread in real-world applications.

Image Classification Long-tailed Object Detection +4

422

Paper
Code

Stimulative Training of Residual Networks: A Social Psychology Perspective of Loafing

1 code implementation • 9 Oct 2022 • Peng Ye, Shengji Tang, Baopu Li, Tao Chen, Wanli Ouyang

In this work, we aim to re-investigate the training process of residual networks from a novel social psychology perspective of loafing, and further propose a new training strategy to strengthen the performance of residual networks.

Paper
Code

CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-training

1 code implementation • ICCV 2023 • Tianyu Huang, Bowen Dong, Yunhan Yang, Xiaoshui Huang, Rynson W. H. Lau, Wanli Ouyang, WangMeng Zuo

To address this issue, we propose CLIP2Point, an image-depth pre-training method by contrastive learning to transfer CLIP to the 3D domain, and adapt it to point cloud classification.

Ranked #3 on Training-free 3D Point Cloud Classification on ScanObjectNN (using extra training data)

Contrastive Learning Few-Shot Learning +4

Paper
Code

Towards Frame Rate Agnostic Multi-Object Tracking

1 code implementation • 23 Sep 2022 • Weitao Feng, Lei Bai, Yongqiang Yao, Fengwei Yu, Wanli Ouyang

In this paper, we propose a Frame Rate Agnostic MOT framework with a Periodic training Scheme (FAPS) to tackle the FraMOT problem for the first time.

Multi-Object Tracking Object

Paper
Code

ZoomNAS: Searching for Whole-body Human Pose Estimation in the Wild

1 code implementation • 23 Aug 2022 • Lumin Xu, Sheng Jin, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang

We propose a single-network approach, termed ZoomNet, to take into account the hierarchical structure of the full human body and solve the scale variation of different body parts.

Ranked #2 on 2D Human Pose Estimation on COCO-WholeBody

2D Human Pose Estimation Neural Architecture Search +1

707

Paper
Code

An Empirical Study of Pseudo-Labeling for Image-based 3D Object Detection

no code implementations • 15 Aug 2022 • Xinzhu Ma, Yuan Meng, Yinmin Zhang, Lei Bai, Jun Hou, Shuai Yi, Wanli Ouyang

We hope this work can provide insights for the image-based 3D detection community under a semi-supervised setting.

3D Object Detection Autonomous Driving +1

Paper
Add Code

Fine-grained Retrieval Prompt Tuning

no code implementations • 29 Jul 2022 • Shijie Wang, Jianlong Chang, Zhihui Wang, Haojie Li, Wanli Ouyang, Qi Tian

In this paper, we develop Fine-grained Retrieval Prompt Tuning (FRPT), which steers a frozen pre-trained model to perform the fine-grained retrieval task from the perspectives of sample prompting and feature adaptation.

Retrieval

Paper
Add Code

3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal

1 code implementation • 22 Jul 2022 • Hao Meng, Sheng Jin, Wentao Liu, Chen Qian, Mengxiang Lin, Wanli Ouyang, Ping Luo

Unlike most previous works that directly predict the 3D poses of two interacting hands simultaneously, we propose to decompose the challenging interacting hand pose estimation task and estimate the pose of each hand separately.

3D Interacting Hand Pose Estimation Hand Pose Estimation

Paper
Code

NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition

no code implementations • 21 Jul 2022 • Boyang xia, Wenhao Wu, Haoran Wang, Rui Su, Dongliang He, Haosen Yang, Xiaoran Fan, Wanli Ouyang

On the video level, a temporal attention module is learned under dual video-level supervisions on both the salient and the non-salient representations.

Ranked #4 on Action Recognition on ActivityNet

Action Recognition Video Classification +1

Paper
Add Code

Pose for Everything: Towards Category-Agnostic Pose Estimation

1 code implementation • 21 Jul 2022 • Lumin Xu, Sheng Jin, Wang Zeng, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang

In this paper, we introduce the task of Category-Agnostic Pose Estimation (CAPE), which aims to create a pose estimation model capable of detecting the pose of any class of object given only a few samples with keypoint definition.

Ranked #4 on 2D Pose Estimation on MP-100

Category-Agnostic Pose Estimation Pose Estimation

183

Paper
Code

Fast-MoCo: Boost Momentum-based Contrastive Learning with Combinatorial Patches

1 code implementation • 17 Jul 2022 • Yuanzheng Ci, Chen Lin, Lei Bai, Wanli Ouyang

Contrastive-based self-supervised learning methods achieved great success in recent years.

Contrastive Learning Self-Supervised Learning

Paper
Code

Action Recognition With Motion Diversification and Dynamic Selection

no code implementations • TIP 2022 • Peiqin Zhuang, Yu Guo, Zhipeng Yu, Luping Zhou, Lei Bai, Ding Liang, Zhiyong Wang, Yali Wang, Wanli Ouyang

To address this issue, we introduce a Motion Diversification and Selection (MoDS) module to generate diversified spatio-temporal motion features and then select the suitable motion representation dynamically for categorizing the input video.

Ranked #18 on Action Recognition on Something-Something V1

Action Recognition Computational Efficiency

Paper
Add Code

Revisiting Classifier: Transferring Vision-Language Models for Video Recognition

5 code implementations • 4 Jul 2022 • Wenhao Wu, Zhun Sun, Wanli Ouyang

In this study, we focus on transferring knowledge for video classification tasks.

Ranked #1 on Action Recognition on ActivityNet

Action Classification Action Recognition +5

200

Paper
Code

TransVG++: End-to-End Visual Grounding with Language Conditioned Vision Transformer

1 code implementation • 14 Jun 2022 • Jiajun Deng, Zhengyuan Yang, Daqing Liu, Tianlang Chen, Wengang Zhou, Yanyong Zhang, Houqiang Li, Wanli Ouyang

For another, we devise Language Conditioned Vision Transformer that removes external fusion modules and reuses the uni-modal ViT for vision-language fusion at the intermediate layers.

Visual Grounding

149

Paper
Code

Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge Distillation

1 code implementation • 13 Jun 2022 • Zengyu Qiu, Xinzhu Ma, Kunlin Yang, Chunya Liu, Jun Hou, Shuai Yi, Wanli Ouyang

Besides, our DPK makes the performance of the student model positively correlated with that of the teacher model, which means that we can further boost the accuracy of students by applying larger teachers.

Image Classification Knowledge Distillation +3

Paper
Code

Domain Invariant Masked Autoencoders for Self-supervised Learning from Multi-domains

no code implementations • 10 May 2022 • Haiyang Yang, Meilin Chen, Yizhou Wang, Shixiang Tang, Feng Zhu, Lei Bai, Rui Zhao, Wanli Ouyang

While recent self-supervised learning methods have achieved good performances with evaluation set on the same domain as the training set, they will have an undesirable performance decrease when tested on a different domain.

Self-Supervised Learning

Paper
Add Code

MS Lesion Segmentation: Revisiting Weighting Mechanisms for Federated Learning

no code implementations • 3 May 2022 • Dongnan Liu, Mariano Cabezas, Dongang Wang, Zihao Tang, Lei Bai, Geng Zhan, Yuling Luo, Kain Kyle, Linda Ly, James Yu, Chun-Chien Shieh, Aria Nguyen, Ettikan Kandasamy Karuppiah, Ryan Sullivan, Fernando Calamante, Michael Barnett, Wanli Ouyang, Weidong Cai, Chenyu Wang

In addition, the segmentation loss function in each client is also re-weighted according to the lesion volume for the data during training.

Federated Learning Lesion Segmentation +1

Paper
Add Code

Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer

1 code implementation • CVPR 2022 • Wang Zeng, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang, Xiaogang Wang

Vision transformers have achieved great successes in many computer vision tasks.

Ranked #4 on 2D Human Pose Estimation on COCO-WholeBody

2D Human Pose Estimation 3D Human Pose Estimation +1

179

Paper
Code

Unsupervised Learning of Accurate Siamese Tracking

1 code implementation • CVPR 2022 • Qiuhong Shen, Lei Qiao, Jinyang Guo, Peixia Li, Xin Li, Bo Li, Weitao Feng, Weihao Gan, Wei Wu, Wanli Ouyang

As unlimited self-supervision signals can be obtained by tracking a video along a cycle in time, we investigate evolving a Siamese tracker by tracking videos forward-backward.

Visual Object Tracking

Paper
Code

SeCo: Separating Unknown Musical Visual Sounds with Consistency Guidance

no code implementations • 25 Mar 2022 • Xinchi Zhou, Dongzhan Zhou, Wanli Ouyang, Hang Zhou, Ziwei Liu, Di Hu

Recent years have witnessed the success of deep learning on the visual sound separation task.

Paper
Add Code

DR.VIC: Decomposition and Reasoning for Video Individual Counting

2 code implementations • CVPR 2022 • Tao Han, Lei Bai, Junyu Gao, Qi Wang, Wanli Ouyang

Instead of relying on the Multiple Object Tracking (MOT) techniques, we propose to solve the problem by decomposing all pedestrians into the initial pedestrians who existed in the first frame and the new pedestrians with separate identities in each following frame.

Crowd Counting Density Estimation +2

Paper
Code

Backbone is All Your Need: A Simplified Architecture for Visual Object Tracking

1 code implementation • 10 Mar 2022 • BoYu Chen, Peixia Li, Lei Bai, Lei Qiao, Qiuhong Shen, Bo Li, Weihao Gan, Wei Wu, Wanli Ouyang

Exploiting a general-purpose neural architecture to replace hand-wired designs or inductive biases has recently drawn extensive interest.

Visual Object Tracking

Paper
Code

Multi-class Token Transformer for Weakly Supervised Semantic Segmentation

1 code implementation • CVPR 2022 • Lian Xu, Wanli Ouyang, Mohammed Bennamoun, Farid Boussaid, Dan Xu

To this end, we propose a Multi-class Token Transformer, termed as MCTformer, which uses multiple class tokens to learn interactions between the class tokens and the patch tokens.

Object Object Localization +2

139

Paper
Code

$β$-DARTS: Beta-Decay Regularization for Differentiable Architecture Search

1 code implementation • 3 Mar 2022 • Peng Ye, Baopu Li, Yikang Li, Tao Chen, Jiayuan Fan, Wanli Ouyang

Neural Architecture Search~(NAS) has attracted increasingly more attention in recent years because of its capability to design deep neural networks automatically.

Ranked #1 on Neural Architecture Search on NAS-Bench-201, CIFAR-100

Neural Architecture Search

Paper
Code

3D Object Detection from Images for Autonomous Driving: A Survey

1 code implementation • 7 Feb 2022 • Xinzhu Ma, Wanli Ouyang, Andrea Simonelli, Elisa Ricci

3D object detection from images, one of the fundamental and challenging problems in autonomous driving, has received increasing attention from both industry and academia in recent years.

3D Object Detection Autonomous Driving +1

135

Paper
Code

Trajectory Forecasting from Detection with Uncertainty-Aware Motion Encoding

no code implementations • 3 Feb 2022 • Pu Zhang, Lei Bai, Jianru Xue, Jianwu Fang, Nanning Zheng, Wanli Ouyang

Trajectories obtained from object detection and tracking are inevitably noisy, which could cause serious forecasting errors to predictors built on ground truth trajectories.

object-detection Object Detection +1

Paper
Add Code

MonoDistill: Learning Spatial Features for Monocular 3D Object Detection

1 code implementation • ICLR 2022 • Zhiyu Chong, Xinzhu Ma, Hong Zhang, Yuxin Yue, Haojie Li, Zhihui Wang, Wanli Ouyang

Finally, this LiDAR Net can serve as the teacher to transfer the learned knowledge to the baseline model.

Monocular 3D Object Detection Object +2

Paper
Code

Pseudo-Labeled Auto-Curriculum Learning for Semi-Supervised Keypoint Localization

no code implementations • ICLR 2022 • Can Wang, Sheng Jin, Yingda Guan, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang

PL approaches apply pseudo-labels to unlabeled data, and then train the model with a combination of the labeled and pseudo-labeled data iteratively.

Paper
Add Code

RePre: Improving Self-Supervised Vision Transformer with Reconstructive Pre-training

no code implementations • 18 Jan 2022 • Luya Wang, Feng Liang, Yangguang Li, Honggang Zhang, Wanli Ouyang, Jing Shao

Recently, self-supervised vision transformers have attracted unprecedented attention for their impressive representation learning ability.

Contrastive Learning Representation Learning

Paper
Add Code

b-DARTS: Beta-Decay Regularization for Differentiable Architecture Search

1 code implementation • CVPR 2022 • Peng Ye, Baopu Li, Yikang Li, Tao Chen, Jiayuan Fan, Wanli Ouyang

Neural Architecture Search (NAS) has attracted increasingly more attention in recent years because of its capability to design deep neural network automatically.

Neural Architecture Search

Paper
Code

Accelerating Neural Network Optimization Through an Automated Control Theory Lens

no code implementations • CVPR 2022 • Jiahao Wang, Baoyuan Wu, Rui Su, Mingdeng Cao, Shuwei Shi, Wanli Ouyang, Yujiu Yang

We conduct experiments both from a control theory lens through a phase locus verification and from a network training lens on several models, including CNNs, Transformers, MLPs, and on benchmark datasets.

Math

Paper
Add Code

Revisiting the Transferability of Supervised Pretraining: an MLP Perspective

no code implementations • CVPR 2022 • Yizhou Wang, Shixiang Tang, Feng Zhu, Lei Bai, Rui Zhao, Donglian Qi, Wanli Ouyang

The pretrain-finetune paradigm is a classical pipeline in visual learning.

domain classification object-detection +2

Paper
Add Code

A Continuous Mapping For Augmentation Design

no code implementations • NeurIPS 2021 • Keyu Tian, Chen Lin, Ser Nam Lim, Wanli Ouyang, Puneet Dokania, Philip Torr

Automated data augmentation (ADA) techniques have played an important role in boosting the performance of deep models.

Data Augmentation

Paper
Add Code

Unsupervised Contrastive Learning with Simple Transformation for 3D Point Cloud Data

no code implementations • 13 Oct 2021 • Jincen Jiang, Xuequan Lu, Wanli Ouyang, Meili Wang

Though a number of point cloud learning methods have been proposed to handle unordered points, most of them are supervised and require labels for training.

3D Object Classification Classification +4

Paper
Add Code

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

3 code implementations • ICLR 2022 • Yangguang Li, Feng Liang, Lichen Zhao, Yufeng Cui, Wanli Ouyang, Jing Shao, Fengwei Yu, Junjie Yan

Recently, large-scale Contrastive Language-Image Pre-training (CLIP) has attracted unprecedented attention for its impressive zero-shot recognition ability and excellent transferability to downstream tasks.

Zero-Shot Learning

649

Paper
Code

Deep Instance Segmentation with Automotive Radar Detection Points

no code implementations • 5 Oct 2021 • Jianan Liu, Weiyi Xiong, Liping Bai, Yuxuan Xia, Tao Huang, Wanli Ouyang, Bing Zhu

Automotive radar provides reliable environmental perception in all-weather conditions with affordable cost, but it hardly supplies semantic and geometry information due to the sparsity of radar detection points.

Autonomous Driving Clustering +3

Paper
Add Code

Improving the Transferability of Supervised Pretraining with an MLP Projector

no code implementations • 29 Sep 2021 • Yizhou Wang, Shixiang Tang, Feng Zhu, Lei Bai, Rui Zhao, Donglian Qi, Wanli Ouyang

The pretrain-finetune paradigm is a classical pipeline in visual learning.

domain classification

Paper
Add Code

Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images

1 code implementation • ICCV 2021 • Size Wu, Sheng Jin, Wentao Liu, Lei Bai, Chen Qian, Dong Liu, Wanli Ouyang

Following the top-down paradigm, we decompose the task into two stages, i. e. person localization and pose estimation.

Ranked #2 on 3D Multi-Person Pose Estimation on Panoptic (using extra training data)

3D Multi-Person Pose Estimation 3D Pose Estimation +1

Paper
Code

Towards Balanced Learning for Instance Recognition

no code implementations • 23 Aug 2021 • Jiangmiao Pang, Kai Chen, Qi Li, Zhihai Xu, Huajun Feng, Jianping Shi, Wanli Ouyang, Dahua Lin

In this work, we carefully revisit the standard training practice of detectors, and find that the detection performance is often limited by the imbalance during the training process, which generally consists in three levels - sample level, feature level, and objective level.

Paper
Add Code

BN-NAS: Neural Architecture Search with Batch Normalization

1 code implementation • ICCV 2021 • BoYu Chen, Peixia Li, Baopu Li, Chen Lin, Chuming Li, Ming Sun, Junjie Yan, Wanli Ouyang

We present BN-NAS, neural architecture search with Batch Normalization (BN-NAS), to accelerate neural architecture search (NAS).

Neural Architecture Search

Paper
Code

PSViT: Better Vision Transformer via Token Pooling and Attention Sharing

no code implementations • 7 Aug 2021 • BoYu Chen, Peixia Li, Baopu Li, Chuming Li, Lei Bai, Chen Lin, Ming Sun, Junjie Yan, Wanli Ouyang

Then, a compact set of the possible combinations for different token pooling and attention sharing mechanisms are constructed.

Paper
Add Code

Geometry Uncertainty Projection Network for Monocular 3D Object Detection

1 code implementation • ICCV 2021 • Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Junjie Yan, Wanli Ouyang

In this paper, we propose a Geometry Uncertainty Projection Network (GUP Net) to tackle the error amplification problem at both inference and training stages.

Ranked #2 on 3D Object Detection From Monocular Images on Waymo Open Dataset

3D Object Detection From Monocular Images Depth Estimation +3

126

Paper
Code

Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection

1 code implementation • 29 Jul 2021 • Yinmin Zhang, Xinzhu Ma, Shuai Yi, Jun Hou, Zhihui Wang, Wanli Ouyang, Dan Xu

In this paper, we propose to learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.

Ranked #10 on Monocular 3D Object Detection on KITTI Cars Moderate

Autonomous Driving Depth Estimation +4

Paper
Code

Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation

1 code implementation • ICCV 2021 • Lian Xu, Wanli Ouyang, Mohammed Bennamoun, Farid Boussaid, Ferdous Sohel, Dan Xu

Motivated by the significant inter-task correlation, we propose a novel weakly supervised multi-task framework termed as AuxSegNet, to leverage saliency detection and multi-label image classification as auxiliary tasks to improve the primary task of semantic segmentation using only image-level ground-truth labels.

Auxiliary Learning Multi-Label Image Classification +6

Paper
Code

GLiT: Neural Architecture Search for Global and Local Image Transformer

2 code implementations • ICCV 2021 • BoYu Chen, Peixia Li, Chuming Li, Baopu Li, Lei Bai, Chen Lin, Ming Sun, Junjie Yan, Wanli Ouyang

We introduce the first Neural Architecture Search (NAS) method to find a better transformer architecture for image recognition.

Ranked #498 on Image Classification on ImageNet

Image Classification Neural Architecture Search

Paper
Code

Mutual CRF-GNN for Few-Shot Learning

no code implementations • CVPR 2021 • Shixiang Tang, Dapeng Chen, Lei Bai, Kaijian Liu, Yixiao Ge, Wanli Ouyang

In this MCGN, the labels and features of support data are used by the CRF for inferring GNN affinities in a principled and probabilistic way.

Few-Shot Learning

Paper
Add Code

AutoSampling: Search for Effective Data Sampling Schedules

no code implementations • 28 May 2021 • Ming Sun, Haoxuan Dou, Baopu Li, Lei Cui, Junjie Yan, Wanli Ouyang

Data sampling acts as a pivotal role in training deep learning models.

Image Classification

Paper
Add Code

ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search

4 code implementations • CVPR 2021 • Lumin Xu, Yingda Guan, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang, Xiaogang Wang

Human pose estimation has achieved significant progress in recent years.

Ranked #23 on Pose Estimation on COCO test-dev (using extra training data)

Neural Architecture Search Pose Estimation

4,971

Paper
Code

Layerwise Optimization by Gradient Decomposition for Continual Learning

no code implementations • CVPR 2021 • Shixiang Tang, Dapeng Chen, Jinguo Zhu, Shijie Yu, Wanli Ouyang

The gradient for update should be close to the gradient of the new task, consistent with the gradients shared by all old tasks, and orthogonal to the space spanned by the gradients specific to the old tasks.

Continual Learning

Paper
Add Code

PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop

2 code implementations • ICCV 2021 • Hongwen Zhang, Yating Tian, Xinchi Zhou, Wanli Ouyang, Yebin Liu, LiMin Wang, Zhenan Sun

Regression-based methods have recently shown promising results in reconstructing human meshes from monocular images.

Ranked #5 on 3D Human Pose Estimation on AGORA (using extra training data)

3D human pose and shape estimation 3D Human Reconstruction +2

576

Paper
Code

Delving into Localization Errors for Monocular 3D Object Detection

1 code implementation • CVPR 2021 • Xinzhu Ma, Yinmin Zhang, Dan Xu, Dongzhan Zhou, Shuai Yi, Haojie Li, Wanli Ouyang

Estimating 3D bounding boxes from monocular images is an essential component in autonomous driving, while accurate 3D object detection from this kind of data is very challenging.

Ranked #7 on 3D Object Detection From Monocular Images on KITTI-360

3D Object Detection From Monocular Images Autonomous Driving +3

153

Paper
Code

Gradient Regularized Contrastive Learning for Continual Domain Adaptation

no code implementations • 23 Mar 2021 • Shixiang Tang, Peng Su, Dapeng Chen, Wanli Ouyang

To better understand this issue, we study the problem of continual domain adaptation, where the model is presented with a labelled source domain and a sequence of unlabelled target domains.

Contrastive Learning Domain Adaptation

Paper
Add Code

Real-Time Visual Object Tracking via Few-Shot Learning

no code implementations • 18 Mar 2021 • Jinghao Zhou, Bo Li, Peng Wang, Peixia Li, Weihao Gan, Wei Wu, Junjie Yan, Wanli Ouyang

Visual Object Tracking (VOT) can be seen as an extended task of Few-Shot Learning (FSL).

Few-Shot Learning Object +2

Paper
Add Code

Higher Performance Visual Tracking with Dual-Modal Localization

no code implementations • 18 Mar 2021 • Jinghao Zhou, Bo Li, Lei Qiao, Peng Wang, Weihao Gan, Wei Wu, Junjie Yan, Wanli Ouyang

Visual Object Tracking (VOT) has synchronous needs for both robustness and accuracy.

regression Visual Object Tracking +1

Paper
Add Code

Probabilistic Graph Attention Network with Conditional Kernels for Pixel-Wise Prediction

no code implementations • 8 Jan 2021 • Dan Xu, Xavier Alameda-Pineda, Wanli Ouyang, Elisa Ricci, Xiaogang Wang, Nicu Sebe

In contrast to previous works directly considering multi-scale feature maps obtained from the inner layers of a primary CNN architecture, and simply fusing the features with weighted averaging or concatenation, we propose a probabilistic graph attention network structure based on a novel Attention-Gated Conditional Random Fields (AG-CRFs) model for learning and fusing multi-scale representations in a principled manner.

Graph Attention Monocular Depth Estimation +1

Paper
Add Code

Aggregation With Feature Detection

no code implementations • ICCV 2021 • Shuyang Sun, Xiaoyu Yue, Xiaojuan Qi, Wanli Ouyang, Victor Adrian Prisacariu, Philip H.S. Torr

Aggregating features from different depths of a network is widely adopted to improve the network capability.

Instance Segmentation object-detection +2

Paper
Add Code

Inception Convolution with Efficient Dilation Search

1 code implementation • CVPR 2021 • Jie Liu, Chuming Li, Feng Liang, Chen Lin, Ming Sun, Junjie Yan, Wanli Ouyang, Dong Xu

To develop a practical method for learning complex inception convolution based on the data, a simple but effective search algorithm, referred to as efficient dilation optimization (EDO), is developed.

Human Detection Instance Segmentation +4

112

Paper
Code

DETR for Crowd Pedestrian Detection

1 code implementation • 12 Dec 2020 • Matthieu Lin, Chuming Li, Xingyuan Bu, Ming Sun, Chen Lin, Junjie Yan, Wanli Ouyang, Zhidong Deng

Furthermore, the bipartite match of ED harms the training efficiency due to the large ground truth number in crowd scenes.

Pedestrian Detection

Paper
Code

Direct Depth Learning Network for Stereo Matching

no code implementations • 10 Dec 2020 • Hong Zhang, Haojie Li, Shenglun Chen, Tiantian Yan, Zhihui Wang, Guo Lu, Wanli Ouyang

To make the Adaptive-Grained Depth Refinement stage robust to the coarse depth and adaptive to the depth range of the points, the Granularity Uncertainty is introduced to Adaptive-Grained Depth Refinement stage.

Autonomous Driving Depth Estimation +1

Paper
Add Code

Full Matching on Low Resolution for Disparity Estimation

no code implementations • 10 Dec 2020 • Hong Zhang, Shenglun Chen, Zhihui Wang, Haojie Li, Wanli Ouyang

To this end, we first propose to decompose the full matching task into multiple stages of the cost aggregation module.

Disparity Estimation

Paper
Add Code

Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection in Autonomous Driving

no code implementations • 27 Nov 2020 • Zhenxun Yuan, Xiao Song, Lei Bai, Wengang Zhou, Zhe Wang, Wanli Ouyang

As a special design of this transformer, the information encoded in the encoder is different from that in the decoder, i. e. the encoder encodes temporal-channel information of multiple frames while the decoder decodes the spatial-channel information for the current frame in a voxel-wise manner.

3D Object Detection Autonomous Driving +3

Paper
Add Code

Evolving Search Space for Neural Architecture Search

1 code implementation • ICCV 2021 • Yuanzheng Ci, Chen Lin, Ming Sun, BoYu Chen, Hongwen Zhang, Wanli Ouyang

The automation of neural architecture design has been a coveted alternative to human experts.

Knowledge Distillation Neural Architecture Search

Paper
Code

Adaptive Gradient Method with Resilience and Momentum

no code implementations • 21 Oct 2020 • Jie Liu, Chen Lin, Chuming Li, Lu Sheng, Ming Sun, Junjie Yan, Wanli Ouyang

Several variants of stochastic gradient descent (SGD) have been proposed to improve the learning effectiveness and efficiency when training deep neural networks, among which some recent influential attempts would like to adaptively control the parameter-wise learning rate (e. g., Adam and RMSProp).

Paper
Add Code

Category-specific Semantic Coherency Learning for Fine-grained Image Recognition

no code implementations • 12 Oct 2020 • Shijie Wang, Zhihui Wang, Haojie Li, Wanli Ouyang

Existing deep learning based weakly supervised fine-grained image recognition (WFGIR) methods usually pick out the discriminative regions from the high-level feature (HLF) maps directly.

Attribute Fine-Grained Image Recognition

Paper
Add Code

Once Quantization-Aware Training: High Performance Extremely Low-bit Architecture Search

1 code implementation • ICCV 2021 • Mingzhu Shen, Feng Liang, Ruihao Gong, Yuhang Li, Chuming Li, Chen Lin, Fengwei Yu, Junjie Yan, Wanli Ouyang

Therefore, we propose to combine Network Architecture Search methods with quantization to enjoy the merits of the two sides.

Neural Architecture Search Quantization +1

Paper
Code

Improving Auto-Augment via Augmentation-Wise Weight Sharing

1 code implementation • NeurIPS 2020 • Keyu Tian, Chen Lin, Ming Sun, Luping Zhou, Junjie Yan, Wanli Ouyang

On CIFAR-10, we achieve a top-1 error rate of 1. 24%, which is currently the best performing single model without extra training data.

Paper
Code

Once Quantized for All: Progressively Searching for Quantized Compact Models

no code implementations • 28 Sep 2020 • Mingzhu Shen, Feng Liang, Chuming Li, Chen Lin, Ming Sun, Junjie Yan, Wanli Ouyang

Automatic search of Quantized Neural Networks (QNN) has attracted a lot of attention.

Neural Architecture Search Quantization

Paper
Add Code

SAMOT: Switcher-Aware Multi-Object Tracking and Still Another MOT Measure

no code implementations • 22 Sep 2020 • Weitao Feng, Zhihao Hu, Baopu Li, Weihao Gan, Wei Wu, Wanli Ouyang

Besides, we propose a new MOT evaluation measure, Still Another IDF score (SAIDF), aiming to focus more on identity issues. This new measure may overcome some problems of the previous measures and provide a better insight for identity issues in MOT.

Multi-Object Tracking Object

Paper
Add Code

Improving Deep Video Compression by Resolution-adaptive Flow Coding

no code implementations • ECCV 2020 • Zhihao Hu, Zhenghao Chen, Dong Xu, Guo Lu, Wanli Ouyang, Shuhang Gu

In this work, we propose a new framework called Resolution-adaptive Flow Coding (RaFC) to effectively compress the flow maps globally and locally, in which we use multi-resolution representations instead of single-resolution representations for both the input flow maps and the output motion features of the MV encoder.

Optical Flow Estimation Video Compression

Paper
Add Code

Exploring the Hierarchy in Relation Labels for Scene Graph Generation

no code implementations • 12 Sep 2020 • Yi Zhou, Shuyang Sun, Chao Zhang, Yikang Li, Wanli Ouyang

By assigning each relationship a single label, current approaches formulate the relationship detection as a classification problem.

Graph Generation Relation +2

Paper
Add Code

BriNet: Towards Bridging the Intra-class and Inter-class Gaps in One-Shot Segmentation

1 code implementation • 14 Aug 2020 • Xianghui Yang, Bairun Wang, Kaige Chen, Xinchi Zhou, Shuai Yi, Wanli Ouyang, Luping Zhou

(2) The object categories at the training and inference stages have no overlap, leaving the inter-class gap.

One-Shot Segmentation

Paper
Code

Rethinking Pseudo-LiDAR Representation

1 code implementation • ECCV 2020 • Xinzhu Ma, Shinan Liu, Zhiyi Xia, Hongwen Zhang, Xingyu Zeng, Wanli Ouyang

Based on this observation, we design an image based CNN detector named Patch-Net, which is more generalized and can be instantiated as pseudo-LiDAR based 3D detectors.

161

Paper
Code

Whole-Body Human Pose Estimation in the Wild

2 code implementations • ECCV 2020 • Sheng Jin, Lumin Xu, Jin Xu, Can Wang, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo

This paper investigates the task of 2D human whole-body pose estimation, which aims to localize dense landmarks on the entire human body including face, hands, body, and feet.

Ranked #8 on 2D Human Pose Estimation on COCO-WholeBody

2D Human Pose Estimation Facial Landmark Detection +2

4,971

Paper
Code

Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation

no code implementations • ECCV 2020 • Sheng Jin, Wentao Liu, Enze Xie, Wenhai Wang, Chen Qian, Wanli Ouyang, Ping Luo

The modules of HGG can be trained end-to-end with the keypoint detection network and is able to supervise the grouping process in a hierarchical manner.

Ranked #3 on Keypoint Detection on OCHuman

2D Human Pose Estimation Clustering +4

Paper
Add Code

3D Human Mesh Regression with Dense Correspondence

3 code implementations • CVPR 2020 • Wang Zeng, Wanli Ouyang, Ping Luo, Wentao Liu, Xiaogang Wang

This paper proposes a model-free 3D human mesh estimation framework, named DecoMR, which explicitly establishes the dense correspondence between the mesh and the local image features in the UV space (i. e. a 2D space used for texture mapping of 3D mesh).

Ranked #1 on 3D Human Reconstruction on Surreal

3D Human Pose Estimation 3D Human Reconstruction +1

164

Paper
Code

Scope Head for Accurate Localization in Object Detection

no code implementations • 11 May 2020 • Geng Zhan, Dan Xu, Guo Lu, Wei Wu, Chunhua Shen, Wanli Ouyang

Existing anchor-based and anchor-free object detectors in multi-stage or one-stage pipelines have achieved very promising detection performance.

Object object-detection +2

Paper
Add Code

Cheaper Pre-training Lunch: An Efficient Paradigm for Object Detection

no code implementations • ECCV 2020 • Dongzhan Zhou, Xinchi Zhou, Hongwen Zhang, Shuai Yi, Wanli Ouyang

In this paper, we propose a general and efficient pre-training paradigm, Montage pre-training, for object detection.

object-detection Object Detection

Paper
Add Code

Location-Aware Feature Selection Text Detection Network

no code implementations • 23 Apr 2020 • Zengyuan Guo, Zilin Wang, Zhihui Wang, Wanli Ouyang, Haojie Li, Wen Gao

However, they are behind in accuracy comparing with recent segmentation-based text detectors.

feature selection regression +2

Paper
Add Code

Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition

4 code implementations • CVPR 2020 • Ziyu Liu, Hongwen Zhang, Zhenghao Chen, Zhiyong Wang, Wanli Ouyang

Spatial-temporal graphs have been widely used by skeleton-based action recognition algorithms to model human action dynamics.

Ranked #4 on 3D Action Recognition on Assembly101

Long-range modeling Skeleton Based Action Recognition

853

Paper
Code

Content Adaptive and Error Propagation Aware Deep Video Compression

no code implementations • ECCV 2020 • Guo Lu, Chunlei Cai, Xiaoyun Zhang, Li Chen, Wanli Ouyang, Dong Xu, Zhiyong Gao

Therefore, the encoder is adaptive to different video contents and achieves better compression performance by reducing the domain gap between the training and testing datasets.

Video Compression

Paper
Add Code

Channel Pruning Guided by Classification Loss and Feature Importance

no code implementations • 15 Mar 2020 • Jinyang Guo, Wanli Ouyang, Dong Xu

To this end, we propose a new strategy to suppress the influence of unimportant features (i. e., the features will be removed at the next pruning stage).

Classification Feature Importance +1

Paper
Add Code

Equalization Loss for Long-Tailed Object Recognition

1 code implementation • CVPR 2020 • Jingru Tan, Changbao Wang, Buyu Li, Quanquan Li, Wanli Ouyang, Changqing Yin, Junjie Yan

Based on it, we propose a simple but effective loss, named equalization loss, to tackle the problem of long-tailed rare categories by simply ignoring those gradients for rare categories.

Ranked #17 on Long-tail Learning on CIFAR-10-LT (ρ=10)

Long-tail Learning Object +3

201

Paper
Code

EcoNAS: Finding Proxies for Economical Neural Architecture Search

no code implementations • CVPR 2020 • Dongzhan Zhou, Xinchi Zhou, Wenwei Zhang, Chen Change Loy, Shuai Yi, Xuesen Zhang, Wanli Ouyang

While many methods have been proposed to improve the efficiency of NAS, the search progress is still laborious because training and evaluating plausible architectures over large search space is time-consuming.

Neural Architecture Search

Paper
Add Code

Learning 3D Human Shape and Pose from Dense Body Parts

1 code implementation • 31 Dec 2019 • Hongwen Zhang, Jie Cao, Guo Lu, Wanli Ouyang, Zhenan Sun

Reconstructing 3D human shape and pose from monocular images is challenging despite the promising results achieved by the most recent learning-based methods.

Ranked #76 on 3D Human Pose Estimation on 3DPW (MPJPE metric)

3D human pose and shape estimation 3D Human Reconstruction +3

217

Paper
Code

Computation Reallocation for Object Detection

no code implementations • ICLR 2020 • Feng Liang, Chen Lin, Ronghao Guo, Ming Sun, Wei Wu, Junjie Yan, Wanli Ouyang

However, classification allocation pattern is usually adopted directly to object detector, which is proved to be sub-optimal.

Instance Segmentation Neural Architecture Search +4

Paper
Add Code

A Shape Transformation-based Dataset Augmentation Framework for Pedestrian Detection

no code implementations • 15 Dec 2019 • Zhe Chen, Wanli Ouyang, Tongliang Liu, DaCheng Tao

Alternatively, to access much more natural-looking pedestrians, we propose to augment pedestrian detection datasets by transforming real pedestrians from the same dataset into different shapes.

Pedestrian Detection

Paper
Add Code

TRB: A Novel Triplet Representation for Understanding 2D Human Body

2 code implementations • ICCV 2019 • Haodong Duan, Kwan-Yee Lin, Sheng Jin, Wentao Liu, Chen Qian, Wanli Ouyang

In this paper, we propose the Triplet Representation for Body (TRB) -- a compact 2D human body representation, with skeleton keypoints capturing human pose information and contour keypoints containing human shape information.

Conditional Image Generation Open-Ended Question Answering

4,971

Paper
Code

Improving One-shot NAS by Suppressing the Posterior Fading

no code implementations • CVPR 2020 • Xiang Li, Chen Lin, Chuming Li, Ming Sun, Wei Wu, Junjie Yan, Wanli Ouyang

In this paper, we analyse existing weight sharing one-shot NAS approaches from a Bayesian point of view and identify the posterior fading problem, which compromises the effectiveness of shared weights.

Neural Architecture Search object-detection +2

Paper
Add Code

IntersectGAN: Learning Domain Intersection for Generating Images with Multiple Attributes

no code implementations • 21 Sep 2019 • Zehui Yao, Boyan Zhang, Zhiyong Wang, Wanli Ouyang, Dong Xu, Dagan Feng

For example, given two image domains $X_1$ and $X_2$ with certain attributes, the intersection $X_1 \cap X_2$ denotes a new domain where images possess the attributes from both $X_1$ and $X_2$ domains.

Attribute

Paper
Add Code

GradNet: Gradient-Guided Network for Visual Object Tracking

2 code implementations • ICCV 2019 • Peixia Li, Bo-Yu Chen, Wanli Ouyang, Dong Wang, Xiaoyun Yang, Huchuan Lu

In this work, we propose a novel gradient-guided network to exploit the discriminative information in gradients and update the template in the siamese network through feed-forward and backward operations.

Ranked #3 on Visual Object Tracking on OTB-2015 (Precision metric)

Object Template Matching +2

Paper
Code

Structured Modeling of Joint Deep Feature and Prediction Refinement for Salient Object Detection

1 code implementation • ICCV 2019 • Yingyue Xu, Dan Xu, Xiaopeng Hong, Wanli Ouyang, Rongrong Ji, Min Xu, Guoying Zhao

We formulate the CRF graphical model that involves message-passing of feature-feature, feature-prediction, and prediction-prediction, from the coarse scale to the finer scale, to update the features and the corresponding predictions.

object-detection RGB Salient Object Detection +1

Paper
Code

Crowd Counting with Deep Structured Scale Integration Network

no code implementations • ICCV 2019 • Lingbo Liu, Zhilin Qiu, Guanbin Li, Shufan Liu, Wanli Ouyang, Liang Lin

Automatic estimation of the number of people in unconstrained crowded scenes is a challenging task and one major difficulty stems from the huge scale variation of people.

Crowd Counting Representation Learning

Paper
Add Code

Improving Description-based Person Re-identification by Multi-granularity Image-text Alignments

no code implementations • 23 Jun 2019 • Kai Niu, Yan Huang, Wanli Ouyang, Liang Wang

Firstly, the global-global alignment in the Global Contrast (GC) module is for matching the global contexts of images and descriptions.

Ranked #19 on Text based Person Retrieval on CUHK-PEDES

Person Re-Identification Text based Person Retrieval

Paper
Add Code

MMDetection: Open MMLab Detection Toolbox and Benchmark

144 code implementations • 17 Jun 2019 • Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, Zheng Zhang, Dazhi Cheng, Chenchen Zhu, Tianheng Cheng, Qijie Zhao, Buyu Li, Xin Lu, Rui Zhu, Yue Wu, Jifeng Dai, Jingdong Wang, Jianping Shi, Wanli Ouyang, Chen Change Loy, Dahua Lin

In this paper, we introduce the various features of this toolbox.

Benchmarking Instance Segmentation +2

27,716

Paper
Code

Improving Action Localization by Progressive Cross-stream Cooperation

no code implementations • CVPR 2019 • Rui Su, Wanli Ouyang, Luping Zhou, Dong Xu

Specifically, we first generate a larger set of region proposals by combining the latest region proposals from both streams, from which we can readily obtain a larger set of labelled training samples to help learn better action detection models.

Action Classification Action Detection +2

Paper
Add Code

Online Hyper-parameter Learning for Auto-Augmentation Strategy

1 code implementation • ICCV 2019 • Chen Lin, Minghao Guo, Chuming Li, Yuan Xin, Wei Wu, Dahua Lin, Wanli Ouyang, Junjie Yan

Data augmentation is critical to the success of modern deep learning techniques.

Data Augmentation

Paper
Code

AM-LFS: AutoML for Loss Function Search

1 code implementation • ICCV 2019 • Chuming Li, Yuan Xin, Chen Lin, Minghao Guo, Wei Wu, Wanli Ouyang, Junjie Yan

The key contribution of this work is the design of search space which can guarantee the generalization and transferability on different vision tasks by including a bunch of existing prevailing loss functions in a unified formulation.

AutoML

Paper
Code

Contextualized Spatial-Temporal Network for Taxi Origin-Destination Demand Prediction

no code implementations • 15 May 2019 • Lingbo Liu, Zhilin Qiu, Guanbin Li, Qing Wang, Wanli Ouyang, Liang Lin

Finally, a GCC module is applied to model the correlation between all regions by computing a global correlation feature as a weighted sum of all regional features, with the weights being calculated as the similarity between the corresponding region pairs.

Paper
Add Code

PRUNING WITH HINTS: AN EFFICIENT FRAMEWORK FOR MODEL ACCELERATION

no code implementations • ICLR 2019 • Wei Gao, Yi Wei, Quanquan Li, Hongwei Qin, Wanli Ouyang, Junjie Yan

Hints can improve the performance of student model by transferring knowledge from teacher model.

Pose Estimation

Paper
Add Code

Box-driven Class-wise Region Masking and Filling Rate Guided Loss for Weakly Supervised Semantic Segmentation

1 code implementation • CVPR 2019 • Chunfeng Song, Yan Huang, Wanli Ouyang, Liang Wang

To address this problem, it is a good choice to learn to segment with weak supervision from bounding boxes.

Weakly-supervised Learning Weakly supervised Semantic Segmentation +1

Paper
Code

Libra R-CNN: Towards Balanced Learning for Object Detection

6 code implementations • CVPR 2019 • Jiangmiao Pang, Kai Chen, Jianping Shi, Huajun Feng, Wanli Ouyang, Dahua Lin

Ranked #149 on Object Detection on COCO test-dev

object-detection Object Detection

27,716

Paper
Code

Feature Intertwiner for Object Detection

2 code implementations • ICLR 2019 • Hongyang Li, Bo Dai, Shaoshuai Shi, Wanli Ouyang, Xiaogang Wang

We argue that the reliable set could guide the feature learning of the less reliable set during training - in spirit of student mimicking teacher behavior and thus pushing towards a more compact class centroid in the feature space.

Ranked #134 on Object Detection on COCO test-dev

Object object-detection +1

107

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.