Search Results for author: Yongming Rao

Found 46 papers, 36 papers with code

Temporal Coherence or Temporal Motion: Which is More Critical for Video-based Person Re-identification?

no code implementations • ECCV 2020 • Guangyi Chen, Yongming Rao, Jiwen Lu, Jie zhou

Specifically, we disentangle the video representation into the temporal coherence and motion parts and randomly change the scale of the temporal motion features as the adversarial noise.

Video-Based Person Re-Identification

Paper
Add Code

X-3D: Explicit 3D Structure Modeling for Point Cloud Recognition

1 code implementation • 23 Apr 2024 • Shuofeng Sun, Yongming Rao, Jiwen Lu, Haibin Yan

However, we contend that such implicit high-dimensional structure modeling approch inadequately represents the local geometric structure of point clouds due to the absence of explicit structural information.

Segmentation

Paper
Code

Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models

1 code implementation • 19 Mar 2024 • Zuyan Liu, Yuhao Dong, Yongming Rao, Jie zhou, Jiwen Lu

In the realm of vision-language understanding, the proficiency of models in interpreting and reasoning over visual content has become a cornerstone for numerous applications.

Ranked #44 on Visual Question Answering on MM-Vet

visual instruction following Visual Question Answering

Paper
Code

Generative Multimodal Models are In-Context Learners

1 code implementation • 20 Dec 2023 • Quan Sun, Yufeng Cui, Xiaosong Zhang, Fan Zhang, Qiying Yu, Zhengxiong Luo, Yueze Wang, Yongming Rao, Jingjing Liu, Tiejun Huang, Xinlong Wang

The human ability to easily solve multimodal tasks in context (i. e., with only a few demonstrations or simple instructions), is what current multimodal systems have largely struggled to imitate.

Ranked #21 on Visual Question Answering on MM-Vet

In-Context Learning Question Answering +2

1,496

Paper
Code

Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior

1 code implementation • 11 Dec 2023 • Fangfu Liu, Diankun Wu, Yi Wei, Yongming Rao, Yueqi Duan

Instead of retraining a costly viewpoint-aware model, we study how to fully exploit easily accessible coarse 3D knowledge to enhance the prompts and guide 2D lifting optimization for refinement.

3D Generation Text to 3D

114

Paper
Code

TCOVIS: Temporally Consistent Online Video Instance Segmentation

1 code implementation • ICCV 2023 • Junlong Li, Bingyao Yu, Yongming Rao, Jie zhou, Jiwen Lu

The core of our method consists of a global instance assignment strategy and a spatio-temporal enhancement module, which improve the temporal consistency of the features from two aspects.

Instance Segmentation Semantic Segmentation +1

Paper
Code

Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models

1 code implementation • ICCV 2023 • Ziyi Wang, Xumin Yu, Yongming Rao, Jie zhou, Jiwen Lu

In this paper, we propose a novel 3D-to-2D generative pre-training method that is adaptable to any point cloud model.

Ranked #6 on 3D Part Segmentation on ShapeNet-Part

3D Part Segmentation 3D Point Cloud Classification

Paper
Code

Unleashing Text-to-Image Diffusion Models for Visual Perception

2 code implementations • ICCV 2023 • Wenliang Zhao, Yongming Rao, Zuyan Liu, Benlin Liu, Jie zhou, Jiwen Lu

In this paper, we propose VPD (Visual Perception with a pre-trained Diffusion model), a new framework that exploits the semantic information of a pre-trained text-to-image diffusion model in visual perception tasks.

Ranked #7 on Referring Expression Segmentation on RefCoCo val

Denoising Image Segmentation +4

7,408

Paper
Code

AdaPoinTr: Diverse Point Cloud Completion with Adaptive Geometry-Aware Transformers

1 code implementation • 11 Jan 2023 • Xumin Yu, Yongming Rao, Ziyi Wang, Jiwen Lu, Jie zhou

In this paper, we present a new method that reformulates point cloud completion as a set-to-set translation problem and design a new model, called PoinTr, which adopts a Transformer encoder-decoder architecture for point cloud completion.

Ranked #2 on Point Cloud Completion on ShapeNet

Denoising Inductive Bias +1

521

Paper
Code

DiffSwap: High-Fidelity and Controllable Face Swapping via 3D-Aware Masked Diffusion

1 code implementation • CVPR 2023 • Wenliang Zhao, Yongming Rao, Weikang Shi, Zuyan Liu, Jie zhou, Jiwen Lu

Unlike previous work that relies on carefully designed network architectures and loss functions to fuse the information from the source and target faces, we reformulate the face swapping as a conditional inpainting task, performed by a powerful diffusion model guided by the desired face attributes (e. g., identity and landmarks).

Face Swapping

Paper
Code

FLAG3D: A 3D Fitness Activity Dataset with Language Instruction

1 code implementation • CVPR 2023 • Yansong Tang, Jinpeng Liu, Aoyang Liu, Bin Yang, Wenxun Dai, Yongming Rao, Jiwen Lu, Jie zhou, Xiu Li

With the continuously thriving popularity around the world, fitness activity analytic has become an emerging research topic in computer vision.

Action Generation Action Recognition +2

Paper
Code

PLOT: Prompt Learning with Optimal Transport for Vision-Language Models

1 code implementation • 3 Oct 2022 • Guangyi Chen, Weiran Yao, Xiangchen Song, Xinyue Li, Yongming Rao, Kun Zhang

To solve this problem, we propose to apply optimal transport to match the vision and text modalities.

112

Paper
Code

P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting

1 code implementation • 4 Aug 2022 • Ziyi Wang, Xumin Yu, Yongming Rao, Jie zhou, Jiwen Lu

Nowadays, pre-training big models on large-scale datasets has become a crucial topic in deep learning.

Ranked #18 on 3D Point Cloud Classification on ScanObjectNN (using extra training data)

3D Part Segmentation 3D Point Cloud Classification

117

Paper
Code

HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions

7 code implementations • 28 Jul 2022 • Yongming Rao, Wenliang Zhao, Yansong Tang, Jie zhou, Ser-Nam Lim, Jiwen Lu

In this paper, we show that the key ingredients behind the vision Transformers, namely input-adaptive, long-range and high-order spatial interactions, can also be efficiently implemented with a convolution-based framework.

Ranked #20 on Semantic Segmentation on ADE20K

Image Classification Object Detection +2

3,157

Paper
Code

Dynamic Spatial Sparsification for Efficient Vision Transformers and Convolutional Neural Networks

1 code implementation • 4 Jul 2022 • Yongming Rao, Zuyan Liu, Wenliang Zhao, Jie zhou, Jiwen Lu

We extend our method to hierarchical models including CNNs and hierarchical vision Transformers as well as more complex dense prediction tasks that require structured feature maps by formulating a more generic dynamic spatial sparsification framework with progressive sparsification and asymmetric computation for different spatial locations.

532

Paper
Code

SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation

1 code implementation • CVPR 2022 • Ziyi Wang, Yongming Rao, Xumin Yu, Jie zhou, Jiwen Lu

Conventional point cloud semantic segmentation methods usually employ an encoder-decoder architecture, where mid-level features are locally aggregated to extract geometric information.

Image Segmentation Point Cloud Segmentation +2

Paper
Code

FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment

1 code implementation • CVPR 2022 • Jinglin Xu, Yongming Rao, Xumin Yu, Guangyi Chen, Jie zhou, Jiwen Lu

Most existing action quality assessment methods rely on the deep features of an entire video to predict the score, which is less reliable due to the non-transparent inference process and poor interpretability.

Action Quality Assessment

Paper
Code

SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation

1 code implementation • 7 Apr 2022 • Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Yongming Rao, Guan Huang, Jiwen Lu, Jie zhou

In this paper, we propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.

Autonomous Driving Monocular Depth Estimation

236

Paper
Code

LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection

1 code implementation • 28 Mar 2022 • Yi Wei, Zibu Wei, Yongming Rao, Jiaxin Li, Jie zhou, Jiwen Lu

In this paper, we propose the LiDAR Distillation to bridge the domain gap induced by different LiDAR beams for 3D object detection.

3D Object Detection object-detection

Paper
Code

A Roadmap for Big Model

no code implementations • 26 Mar 2022 • Sha Yuan, Hanyu Zhao, Shuai Zhao, Jiahong Leng, Yangxiao Liang, Xiaozhi Wang, Jifan Yu, Xin Lv, Zhou Shao, Jiaao He, Yankai Lin, Xu Han, Zhenghao Liu, Ning Ding, Yongming Rao, Yizhao Gao, Liang Zhang, Ming Ding, Cong Fang, Yisen Wang, Mingsheng Long, Jing Zhang, Yinpeng Dong, Tianyu Pang, Peng Cui, Lingxiao Huang, Zheng Liang, HuaWei Shen, HUI ZHANG, Quanshi Zhang, Qingxiu Dong, Zhixing Tan, Mingxuan Wang, Shuo Wang, Long Zhou, Haoran Li, Junwei Bao, Yingwei Pan, Weinan Zhang, Zhou Yu, Rui Yan, Chence Shi, Minghao Xu, Zuobai Zhang, Guoqiang Wang, Xiang Pan, Mengjie Li, Xiaoyu Chu, Zijun Yao, Fangwei Zhu, Shulin Cao, Weicheng Xue, Zixuan Ma, Zhengyan Zhang, Shengding Hu, Yujia Qin, Chaojun Xiao, Zheni Zeng, Ganqu Cui, Weize Chen, Weilin Zhao, Yuan YAO, Peng Li, Wenzhao Zheng, Wenliang Zhao, Ziyi Wang, Borui Zhang, Nanyi Fei, Anwen Hu, Zenan Ling, Haoyang Li, Boxi Cao, Xianpei Han, Weidong Zhan, Baobao Chang, Hao Sun, Jiawen Deng, Chujie Zheng, Juanzi Li, Lei Hou, Xigang Cao, Jidong Zhai, Zhiyuan Liu, Maosong Sun, Jiwen Lu, Zhiwu Lu, Qin Jin, Ruihua Song, Ji-Rong Wen, Zhouchen Lin, LiWei Wang, Hang Su, Jun Zhu, Zhifang Sui, Jiajun Zhang, Yang Liu, Xiaodong He, Minlie Huang, Jian Tang, Jie Tang

With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm.

Language Modelling Machine Translation +1

Paper
Add Code

Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion

2 code implementations • CVPR 2022 • Tianpei Gu, Guangyi Chen, Junlong Li, Chunze Lin, Yongming Rao, Jie zhou, Jiwen Lu

Human behavior has the nature of indeterminacy, which requires the pedestrian trajectory prediction system to model the multi-modality of future motion states.

Pedestrian Trajectory Prediction Trajectory Prediction

161

Paper
Code

Back to Reality: Weakly-supervised 3D Object Detection with Shape-guided Label Enhancement

2 code implementations • CVPR 2022 • Xiuwei Xu, Yifan Wang, Yu Zheng, Yongming Rao, Jie zhou, Jiwen Lu

In this paper, we propose a weakly-supervised approach for 3D object detection, which makes it possible to train a strong 3D detector with position-level annotations (i. e. annotations of object centers).

3D Object Detection Domain Adaptation +3

Paper
Code

Multi-View Partial (MVP) Point Cloud Challenge 2021 on Completion and Registration: Methods and Results

2 code implementations • 22 Dec 2021 • Liang Pan, Tong Wu, Zhongang Cai, Ziwei Liu, Xumin Yu, Yongming Rao, Jiwen Lu, Jie zhou, Mingye Xu, Xiaoyuan Luo, Kexue Fu, Peng Gao, Manning Wang, Yali Wang, Yu Qiao, Junsheng Zhou, Xin Wen, Peng Xiang, Yu-Shen Liu, Zhizhong Han, Yuanjie Yan, Junyi An, Lifa Zhu, Changwei Lin, Dongrui Liu, Xin Li, Francisco Gómez-Fernández, Qinlong Wang, Yang Yang

Based on the MVP dataset, this paper reports methods and results in the Multi-View Partial Point Cloud Challenge 2021 on Completion and Registration.

3D Reconstruction Point Cloud Completion +2

153

Paper
Code

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

1 code implementation • CVPR 2022 • Yongming Rao, Wenliang Zhao, Guangyi Chen, Yansong Tang, Zheng Zhu, Guan Huang, Jie zhou, Jiwen Lu

In this work, we present a new framework for dense prediction by implicitly and explicitly leveraging the pre-trained knowledge from CLIP.

Image-text matching Instance Segmentation +6

489

Paper
Code

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling

2 code implementations • CVPR 2022 • Xumin Yu, Lulu Tang, Yongming Rao, Tiejun Huang, Jie zhou, Jiwen Lu

Inspired by BERT, we devise a Masked Point Modeling (MPM) task to pre-train point cloud Transformers.

Ranked #13 on Few-Shot 3D Point Cloud Classification on ModelNet40 5-way (10-shot) (using extra training data)

3D Point Cloud Linear Classification Few-Shot 3D Point Cloud Classification +2

490

Paper
Code

Structure-Preserving Image Super-Resolution

1 code implementation • 26 Sep 2021 • Cheng Ma, Yongming Rao, Jiwen Lu, Jie zhou

Firstly, we propose SPSR with gradient guidance (SPSR-G) by exploiting gradient maps of images to guide the recovery in two aspects.

Image Super-Resolution SSIM

441

Paper
Code

NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo

1 code implementation • ICCV 2021 • Yi Wei, Shaohui Liu, Yongming Rao, Wang Zhao, Jiwen Lu, Jie zhou

In this work, we present a new multi-view depth estimation method that utilizes both conventional reconstruction and learning-based priors over the recently proposed neural radiance fields (NeRF).

Depth Estimation

434

Paper
Code

PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers

1 code implementation • ICCV 2021 • Xumin Yu, Yongming Rao, Ziyi Wang, Zuyan Liu, Jiwen Lu, Jie zhou

In this paper, we present a new method that reformulates point cloud completion as a set-to-set translation problem and design a new model, called PoinTr that adopts a transformer encoder-decoder architecture for point cloud completion.

Ranked #1 on Point Cloud Completion on ShapeNet (Chamfer Distance L2 metric)

Inductive Bias Point Cloud Completion +1

521

Paper
Code

Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

1 code implementation • ICCV 2021 • Yongming Rao, Guangyi Chen, Jiwen Lu, Jie zhou

Unlike most existing methods that learn visual attention based on conventional likelihood, we propose to learn the attention with counterfactual causality, which provides a tool to measure the attention quality and a powerful supervisory signal to guide the learning process.

Ranked #8 on Vehicle Re-Identification on VehicleID Medium

Causal Inference counterfactual +6

140

Paper
Code

RandomRooms: Unsupervised Pre-training from Synthetic Shapes and Randomized Layouts for 3D Object Detection

2 code implementations • ICCV 2021 • Yongming Rao, Benlin Liu, Yi Wei, Jiwen Lu, Cho-Jui Hsieh, Jie zhou

In particular, we propose to generate random layouts of a scene by making use of the objects in the synthetic CAD dataset and learn the 3D scene representation by applying object-level contrastive learning on two random scenes generated from the same set of synthetic objects.

3D Object Detection Contrastive Learning +3

Paper
Code

Group-aware Contrastive Regression for Action Quality Assessment

1 code implementation • ICCV 2021 • Xumin Yu, Yongming Rao, Wenliang Zhao, Jiwen Lu, Jie zhou

Assessing action quality is challenging due to the subtle differences between videos and large variations in scores.

Ranked #2 on Action Quality Assessment on MTL-AQA

Action Quality Assessment regression

Paper
Code

Towards Interpretable Deep Metric Learning with Structural Matching

1 code implementation • ICCV 2021 • Wenliang Zhao, Yongming Rao, Ziyi Wang, Jiwen Lu, Jie zhou

Our method is model-agnostic, which can be applied to off-the-shelf backbone networks and metric learning methods.

Ranked #16 on Metric Learning on CUB-200-2011

Metric Learning

Paper
Code

Global Filter Networks for Image Classification

4 code implementations • NeurIPS 2021 • Yongming Rao, Wenliang Zhao, Zheng Zhu, Jiwen Lu, Jie zhou

Recent advances in self-attention and pure multi-layer perceptrons (MLP) models for vision have shown great potential in achieving promising performance with fewer inductive biases.

Ranked #9 on Image Classification on Stanford Cars (using extra training data)

Classification Domain Generalization +1

391

Paper
Code

DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification

1 code implementation • NeurIPS 2021 • Yongming Rao, Wenliang Zhao, Benlin Liu, Jiwen Lu, Jie zhou, Cho-Jui Hsieh

Based on this observation, we propose a dynamic token sparsification framework to prune redundant tokens progressively and dynamically based on the input.

Ranked #3 on Efficient ViTs on ImageNet-1K (With LV-ViT-S)

Blocking Efficient ViTs

532

Paper
Code

PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds

1 code implementation • CVPR 2021 • Yi Wei, Ziyi Wang, Yongming Rao, Jiwen Lu, Jie zhou

In this paper, we propose a Point-Voxel Recurrent All-Pairs Field Transforms (PV-RAFT) method to estimate scene flow from point clouds.

Scene Flow Estimation

Paper
Code

MetaDistiller: Network Self-Boosting via Meta-Learned Top-Down Distillation

no code implementations • ECCV 2020 • Benlin Liu, Yongming Rao, Jiwen Lu, Jie zhou, Cho-Jui Hsieh

Knowledge Distillation (KD) has been one of the most popu-lar methods to learn a compact model.

Knowledge Distillation Meta-Learning

Paper
Add Code

Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds

1 code implementation • CVPR 2020 • Yongming Rao, Jiwen Lu, Jie zhou

Based on this hypothesis, we propose to learn point cloud representation by bidirectional reasoning between the local structures at different abstraction hierarchies and the global shape without human supervision.

3D Object Classification General Classification +2

113

Paper
Code

Deep Face Super-Resolution with Iterative Collaboration between Attentive Recovery and Landmark Estimation

1 code implementation • CVPR 2020 • Cheng Ma, Zhenyu Jiang, Yongming Rao, Jiwen Lu, Jie zhou

In this paper, we propose a deep face super-resolution (FSR) method with iterative collaboration between two recurrent networks which focus on facial image recovery and landmark estimation respectively.

Super-Resolution

293

Paper
Code

Structure-Preserving Super Resolution with Gradient Guidance

2 code implementations • CVPR 2020 • Cheng Ma, Yongming Rao, Yean Cheng, Ce Chen, Jiwen Lu, Jie zhou

In this paper, we propose a structure-preserving super resolution method to alleviate the above issue while maintaining the merits of GAN-based methods to generate perceptual-pleasant details.

Ranked #46 on Image Super-Resolution on Urban100 - 4x upscaling

Generative Adversarial Network Image Super-Resolution +1

441

Paper
Code

P$^2$GNet: Pose-Guided Point Cloud Generating Networks for 6-DoF Object Pose Estimation

no code implementations • 19 Dec 2019 • Peiyu Yu, Yongming Rao, Jiwen Lu, Jie zhou

Humans are able to perform fast and accurate object pose estimation even under severe occlusion by exploiting learned object model priors from everyday life.

6D Pose Estimation 6D Pose Estimation using RGB +1

Paper
Add Code

Spherical Fractal Convolutional Neural Networks for Point Cloud Recognition

no code implementations • CVPR 2019 • Yongming Rao, Jiwen Lu, Jie Zhou

We present a generic, flexible and 3D rotation invariant framework based on spherical symmetry for point cloud recognition.

Ranked #44 on 3D Part Segmentation on ShapeNet-Part

3D Object Classification 3D Part Segmentation +2

Paper
Add Code

COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis

no code implementations • CVPR 2019 • Yansong Tang, Dajun Ding, Yongming Rao, Yu Zheng, Danyang Zhang, Lili Zhao, Jiwen Lu, Jie zhou

There are substantial instructional videos on the Internet, which enables us to acquire knowledge for completing various tasks.

Action Detection

Paper
Add Code

Learning Globally Optimized Object Detector via Policy Gradient

no code implementations • CVPR 2018 • Yongming Rao, Dahua Lin, Jiwen Lu, Jie zhou

In this paper, we propose a simple yet effective method to learn globally optimized detector for object detection, which is a simple modification to the standard cross-entropy gradient inspired by the REINFORCE algorithm.

Object object-detection +1

Paper
Add Code

Runtime Neural Pruning

no code implementations • NeurIPS 2017 • Ji Lin, Yongming Rao, Jiwen Lu, Jie zhou

In this paper, we propose a Runtime Neural Pruning (RNP) framework which prunes the deep neural network dynamically at the runtime.

Paper
Add Code

Learning Discriminative Aggregation Network for Video-Based Face Recognition

no code implementations • ICCV 2017 • Yongming Rao, Ji Lin, Jiwen Lu, Jie zhou

In this paper, we propose a discriminative aggregation network (DAN) for video face recognition, which aims to integrate information from video frames effectively and efficiently.

Face Recognition Metric Learning

Paper
Add Code

Attention-Aware Deep Reinforcement Learning for Video Face Recognition

no code implementations • ICCV 2017 • Yongming Rao, Jiwen Lu, Jie zhou

In this paper, we propose an attention-aware deep reinforcement learning (ADRL) method for video face recognition, which aims to discard the misleading and confounding frames and find the focuses of attention in face videos for person recognition.

Face Recognition Person Recognition +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.