Search Results for author: Zhaoxiang Zhang

Found 94 papers, 52 papers with code

MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets

no code implementations • 22 Apr 2024 • Zeyu Li, Ruitong Gan, Chuanchen Luo, Yuxi Wang, Jiaheng Liu, Ziwei Zhu Man Zhang, Qing Li, XuCheng Yin, Zhaoxiang Zhang, Junran Peng

Driven by powerful image diffusion models, recent research has achieved the automatic creation of 3D objects from textual or visual guidance.

Paper
Add Code

Robust Depth Enhancement via Polarization Prompt Fusion Tuning

no code implementations • 5 Apr 2024 • Kei Ikemura, Yiming Huang, Felix Heide, Zhaoxiang Zhang, Qifeng Chen, Chenyang Lei

Existing depth sensors are imperfect and may provide inaccurate depth values in challenging scenarios, such as in the presence of transparent or reflective objects.

Paper
Add Code

CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians

no code implementations • 1 Apr 2024 • Yang Liu, He Guan, Chuanchen Luo, Lue Fan, Junran Peng, Zhaoxiang Zhang

The advancement of real-time 3D scene reconstruction and novel view synthesis has been significantly propelled by 3D Gaussian Splatting (3DGS).

3D Scene Reconstruction Novel View Synthesis

Paper
Add Code

Enhancing Visual Continual Learning with Language-Guided Supervision

no code implementations • 24 Mar 2024 • Bolin Ni, Hongbo Zhao, Chenghao Zhang, Ke Hu, Gaofeng Meng, Zhaoxiang Zhang, Shiming Xiang

Existing methods commonly utilize the one-hot labels and randomly initialize the classifier head.

Class Incremental Learning Incremental Learning +1

Paper
Add Code

SceneX:Procedural Controllable Large-scale Scene Generation via Large-language Models

no code implementations • 23 Mar 2024 • Mengqi Zhou, Jun Hou, Chuanchen Luo, Yuxi Wang, Zhaoxiang Zhang, Junran Peng

Due to its great application potential, large-scale scene generation has drawn extensive attention in academia and industry.

Language Modelling Large Language Model +1

Paper
Add Code

Generative Active Learning for Image Synthesis Personalization

1 code implementation • 22 Mar 2024 • Xulu Zhang, WengYu Zhang, Xiao-Yong Wei, Jinlin Wu, Zhaoxiang Zhang, Zhen Lei, Qing Li

The primary challenge in conducting active learning on generative models lies in the open-ended nature of querying, which differs from the closed form of querying in discriminative models that typically target a single concept.

Active Learning Image Generation

Paper
Code

Continual Forgetting for Pre-trained Vision Models

2 code implementations • 18 Mar 2024 • Hongbo Zhao, Bolin Ni, Haochen Wang, Junsong Fan, Fei Zhu, Yuxi Wang, Yuntao Chen, Gaofeng Meng, Zhaoxiang Zhang

(i) For unwanted knowledge, efficient and effective deleting is crucial.

Continual Forgetting Face Recognition +3

203

Paper
Code

Open-world Machine Learning: A Review and New Outlooks

no code implementations • 4 Mar 2024 • Fei Zhu, Shijie Ma, Zhen Cheng, Xu-Yao Zhang, Zhaoxiang Zhang, Cheng-Lin Liu

This paper aims to provide a comprehensive introduction to the emerging open-world machine learning paradigm, to help researchers build more powerful AI systems in their respective fields, and to promote the development of artificial general intelligence.

Class Incremental Learning Incremental Learning +1

Paper
Add Code

MemoNav: Working Memory Model for Visual Navigation

1 code implementation • 29 Feb 2024 • Hongxin Li, Zeyu Wang, Xu Yang, Yuran Yang, Shuqi Mei, Zhaoxiang Zhang

Subsequently, a graph attention module encodes the retained STM and the LTM to generate working memory (WM) which contains the scene features essential for efficient navigation.

Decision Making Graph Attention +2

Paper
Code

DiffSpeaker: Speech-Driven 3D Facial Animation with Diffusion Transformer

1 code implementation • 8 Feb 2024 • Zhiyuan Ma, Xiangyu Zhu, GuoJun Qi, Chen Qian, Zhaoxiang Zhang, Zhen Lei

We suspect this is due to a shortage of paired audio-4D data, which is crucial for the Transformer to effectively perform as a denoiser within the Diffusion framework.

Paper
Code

Segment Anything in 3D Gaussians

no code implementations • 31 Jan 2024 • Xu Hu, Yuxi Wang, Lue Fan, Junsong Fan, Junran Peng, Zhen Lei, Qing Li, Zhaoxiang Zhang

In this paper, we propose a novel approach to achieve object segmentation in 3D Gaussian via an interactive procedure without any training process and learned parameters.

Segmentation Semantic Segmentation

Paper
Add Code

MixSup: Mixed-grained Supervision for Label-efficient LiDAR-based 3D Object Detection

1 code implementation • 29 Jan 2024 • Yuxue Yang, Lue Fan, Zhaoxiang Zhang

Thus, MixSup leverages massive coarse cluster-level labels to learn semantics and a few expensive box-level labels to learn accurate poses and shapes.

3D Object Detection object-detection

Paper
Code

Seek for Incantations: Towards Accurate Text-to-Image Diffusion Synthesis through Prompt Engineering

no code implementations • 12 Jan 2024 • Chang Yu, Junran Peng, Xiangyu Zhu, Zhaoxiang Zhang, Qi Tian, Zhen Lei

The text-to-image synthesis by diffusion models has recently shown remarkable performance in generating high-quality images.

Image Generation Prompt Engineering

Paper
Add Code

FurniScene: A Large-scale 3D Room Dataset with Intricate Furnishing Scenes

no code implementations • 7 Jan 2024 • Genghao Zhang, Yuxi Wang, Chuanchen Luo, Shibiao Xu, Junran Peng, Zhaoxiang Zhang, Man Zhang

Indoor scene generation has attracted significant attention recently as it is crucial for applications of gaming, virtual reality, and interior design.

Scene Generation

Paper
Add Code

Pareto-based Multi-Objective Recommender System with Forgetting Curve

no code implementations • 28 Dec 2023 • Jipeng Jin, Zhaoxiang Zhang, Zhiheng Li, Xiaofeng Gao, Xiongwen Yang, Lei Xiao, Jie Jiang

Considering recency effect in memories, we propose a forgetting model based on Ebbinghaus Forgetting Curve to cope with negative feedback.

Recommendation Systems

Paper
Add Code

Bootstrap Masked Visual Modeling via Hard Patches Mining

1 code implementation • 21 Dec 2023 • Haochen Wang, Junsong Fan, Yuxi Wang, Kaiyou Song, Tiancai Wang, Xiangyu Zhang, Zhaoxiang Zhang

To empower the model as a teacher, we propose Hard Patches Mining (HPM), predicting patch-wise losses and subsequently determining where to mask.

Paper
Code

Compositional Inversion for Stable Diffusion Models

1 code implementation • 13 Dec 2023 • Xulu Zhang, Xiao-Yong Wei, Jinlin Wu, Tianyi Zhang, Zhaoxiang Zhang, Zhen Lei, Qing Li

It stems from the fact that during inversion, the irrelevant semantics in the user images are also encoded, forcing the inverted concepts to occupy locations far from the core distribution in the embedding space.

Paper
Code

GPT4SGG: Synthesizing Scene Graphs from Holistic and Region-specific Narratives

no code implementations • 7 Dec 2023 • Zuyao Chen, Jinlin Wu, Zhen Lei, Zhaoxiang Zhang, Changwen Chen

Learning scene graphs from natural language descriptions has proven to be a cheap and promising scheme for Scene Graph Generation (SGG).

Graph Generation Scene Graph Generation +1

Paper
Add Code

Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving

1 code implementation • 29 Nov 2023 • Yuqi Wang, JiaWei He, Lue Fan, Hongxin Li, Yuntao Chen, Zhaoxiang Zhang

In autonomous driving, predicting future events in advance and evaluating the foreseeable risks empowers autonomous vehicles to better plan their actions, enhancing safety and efficiency on the road.

Autonomous Driving

200

Paper
Code

Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention

no code implementations • 18 Nov 2023 • Zuyao Chen, Jinlin Wu, Zhen Lei, Zhaoxiang Zhang, Changwen Chen

For the more challenging settings of relation-involved open vocabulary SGG, the proposed approach integrates relation-aware pre-training utilizing image-caption data and retains visual-concept alignment through knowledge distillation.

Concept Alignment Graph Generation +6

Paper
Add Code

Visual Commonsense based Heterogeneous Graph Contrastive Learning

no code implementations • 11 Nov 2023 • Zongzhao Li, Xiangyu Zhu, Xi Zhang, Zhaoxiang Zhang, Zhen Lei

Specifically, our model contains two key components: the Commonsense-based Contrastive Learning and the Graph Relation Network.

Contrastive Learning Question Answering +4

Paper
Add Code

RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models

1 code implementation • 1 Oct 2023 • Zekun Moore Wang, Zhongyuan Peng, Haoran Que, Jiaheng Liu, Wangchunshu Zhou, Yuhan Wu, Hongcheng Guo, Ruitong Gan, Zehao Ni, Man Zhang, Zhaoxiang Zhang, Wanli Ouyang, Ke Xu, Wenhu Chen, Jie Fu, Junran Peng

The advent of Large Language Models (LLMs) has paved the way for complex tasks such as role-playing, which enhances user interactions by enabling models to imitate various characters.

Benchmarking

384

Paper
Code

Informative Data Mining for One-Shot Cross-Domain Semantic Segmentation

no code implementations • ICCV 2023 • Yuxi Wang, Jian Liang, Jun Xiao, Shuqi Mei, Yuran Yang, Zhaoxiang Zhang

One-shot domain adaptation methods attempt to overcome these challenges by transferring the pre-trained source model to the target domain using only one target data.

Domain Adaptation Semantic Segmentation +1

Paper
Add Code

DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions

1 code implementation • NeurIPS 2023 • Haochen Wang, Junsong Fan, Yuxi Wang, Kaiyou Song, Tong Wang, Zhaoxiang Zhang

As it is empirically observed that Vision Transformers (ViTs) are quite insensitive to the order of input tokens, the need for an appropriate self-supervised pretext task that enhances the location awareness of ViTs is becoming evident.

Position

Paper
Code

Bootstrap Fine-Grained Vision-Language Alignment for Unified Zero-Shot Anomaly Localization

1 code implementation • 30 Aug 2023 • Hanqiu Deng, Zhaoxiang Zhang, Jinan Bao, Xingyu Li

On top of the proposed AnoCLIP, we further introduce a test-time adaptation (TTA) mechanism to refine visual anomaly localization results, where we optimize a lightweight adapter in the visual encoder using AnoCLIP's pseudo-labels and noise-corrupted tokens.

Anomaly Detection Test-time Adaptation +1

Paper
Code

FSD V2: Improving Fully Sparse 3D Object Detection with Virtual Voxels

2 code implementations • 7 Aug 2023 • Lue Fan, Feng Wang, Naiyan Wang, Zhaoxiang Zhang

Consequently, we develop a suite of components to complement the virtual voxel concept, including a virtual voxel encoder, a virtual voxel mixer, and a virtual voxel assignment strategy.

3D Object Detection Clustering +4

737

Paper
Code

DiffusePast: Diffusion-based Generative Replay for Class Incremental Semantic Segmentation

no code implementations • 2 Aug 2023 • Jingfan Chen, Yuxi Wang, Pengfei Wang, Xiao Chen, Zhaoxiang Zhang, Zhen Lei, Qing Li

The Class Incremental Semantic Segmentation (CISS) extends the traditional segmentation task by incrementally learning newly added classes.

Class-Incremental Semantic Segmentation Segmentation

Paper
Add Code

Echoes Beyond Points: Unleashing the Power of Raw Radar Data in Multi-modality Fusion

1 code implementation • NeurIPS 2023 • Yang Liu, Feng Wang, Naiyan Wang, Zhaoxiang Zhang

Radar is ubiquitous in autonomous driving systems due to its low cost and good adaptability to bad weather.

Autonomous Driving Point Cloud Generation

Paper
Code

DDG-Net: Discriminability-Driven Graph Network for Weakly-supervised Temporal Action Localization

1 code implementation • ICCV 2023 • Xiaojun Tang, Junsong Fan, Chuanchen Luo, Zhaoxiang Zhang, Man Zhang, Zongyuan Yang

Considering this phenomenon, we propose Discriminability-Driven Graph Network (DDG-Net), which explicitly models ambiguous snippets and discriminative snippets with well-designed connections, preventing the transmission of ambiguous information and enhancing the discriminability of snippet-level representations.

Weakly-supervised Temporal Action Localization Weakly Supervised Temporal Action Localization

Paper
Code

BMAD: Benchmarks for Medical Anomaly Detection

1 code implementation • 20 Jun 2023 • Jinan Bao, Hanshi Sun, Hanqiu Deng, Yinsheng He, Zhaoxiang Zhang, Xingyu Li

However, there is a lack of a universal and fair benchmark for evaluating AD methods on medical images, which hinders the development of more generalized and robust AD methods in this specific domain.

Anomaly Detection Medical Diagnosis

Paper
Code

Visually-Guided Sound Source Separation with Audio-Visual Predictive Coding

1 code implementation • 19 Jun 2023 • Zengjie Song, Zhaoxiang Zhang

The framework of visually-guided sound source separation generally consists of three parts: visual feature extraction, multimodal feature fusion, and sound signal processing.

valid Visually Guided Sound Source Separation

Paper
Code

PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation

1 code implementation • 16 Jun 2023 • Yuqi Wang, Yuntao Chen, Xingyu Liao, Lue Fan, Zhaoxiang Zhang

In this work, we address this limitation by studying camera-based 3D panoptic segmentation, aiming to achieve a unified occupancy representation for camera-only 3D scene understanding.

Autonomous Driving object-detection +5

102

Paper
Code

Tracking Objects with 3D Representation from Videos

no code implementations • 8 Jun 2023 • JiaWei He, Lue Fan, Yuqi Wang, Yuntao Chen, Zehao Huang, Naiyan Wang, Zhaoxiang Zhang

In this paper, we rethink the data association in 2D MOT and utilize the 3D object representation to separate each object in the feature space.

Multiple Object Tracking Object +1

Paper
Add Code

Weakly Supervised 3D Object Detection with Multi-Stage Generalization

no code implementations • 8 Jun 2023 • JiaWei He, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang

We devise the DoubleClustering algorithm to obtain object clusters from reconstructed scene-level points, and further enhance the model's detection capabilities by developing three stages of generalization: progressing from complete to partial, static to dynamic, and close to distant.

3D Reconstruction Monocular 3D Object Detection +3

Paper
Add Code

Using Unreliable Pseudo-Labels for Label-Efficient Semantic Segmentation

no code implementations • 4 Jun 2023 • Haochen Wang, Yuchao Wang, Yujun Shen, Junsong Fan, Yuxi Wang, Zhaoxiang Zhang

A common practice is to select the highly confident predictions as the pseudo-ground-truths for each pixel, but it leads to a problem that most pixels may be left unused due to their unreliability.

Semantic Segmentation

Paper
Add Code

Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory

1 code implementation • 25 May 2023 • Xizhou Zhu, Yuntao Chen, Hao Tian, Chenxin Tao, Weijie Su, Chenyu Yang, Gao Huang, Bin Li, Lewei Lu, Xiaogang Wang, Yu Qiao, Zhaoxiang Zhang, Jifeng Dai

These agents, equipped with the logic and common sense capabilities of LLMs, can skillfully navigate complex, sparse-reward environments with text-based interactions.

Common Sense Reasoning Navigate +1

567

Paper
Code

Pulling Target to Source: A New Perspective on Domain Adaptive Semantic Segmentation

no code implementations • 23 May 2023 • Haochen Wang, Yujun Shen, Jingjing Fei, Wei Li, Liwei Wu, Yuxi Wang, Zhaoxiang Zhang

To this end, we propose T2S-DA, which we interpret as a form of pulling Target to Source for Domain Adaptation, encouraging the model in learning similar cross-domain features.

Domain Generalization Semantic Segmentation

Paper
Add Code

Real-Aug: Realistic Scene Synthesis for LiDAR Augmentation in 3D Object Detection

no code implementations • 22 May 2023 • Jinglin Zhan, Tiejun Liu, RenGang Li, Jingwei Zhang, Zhaoxiang Zhang, Yuntao Chen

Data and model are the undoubtable two supporting pillars for LiDAR object detection.

3D Object Detection Data Augmentation +1

Paper
Add Code

Fully Sparse Fusion for 3D Object Detection

1 code implementation • 24 Apr 2023 • Yingyan Li, Lue Fan, Yang Liu, Zehao Huang, Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang, Tieniu Tan

In this paper, we study how to effectively leverage image modality in the emerging fully sparse architecture.

3D Instance Segmentation 3D Object Detection +3

Paper
Code

Once Detected, Never Lost: Surpassing Human Performance in Offline LiDAR based 3D Object Detection

2 code implementations • ICCV 2023 • Lue Fan, Yuxue Yang, Yiming Mao, Feng Wang, Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang

Drawing inspiration from this, we propose a high-performance offline detector in a track-centric perspective instead of the conventional object-centric perspective.

3D Object Detection Object +1

737

Paper
Code

Hard Patches Mining for Masked Image Modeling

1 code implementation • CVPR 2023 • Haochen Wang, Kaiyou Song, Junsong Fan, Yuxi Wang, Jin Xie, Zhaoxiang Zhang

We observe that the reconstruction loss can naturally be the metric of the difficulty of the pre-training task.

Paper
Code

3D Video Object Detection with Learnable Object-Centric Global Optimization

1 code implementation • CVPR 2023 • JiaWei He, Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang

We explore long-term temporal visual correspondence-based optimization for 3D video object detection in this work.

3D Scene Reconstruction Object +2

Paper
Code

Learnable Graph Matching: A Practical Paradigm for Data Association

1 code implementation • 27 Mar 2023 • JiaWei He, Zehao Huang, Naiyan Wang, Zhaoxiang Zhang

Data association is at the core of many computer vision tasks, e. g., multiple object tracking, image matching, and point cloud registration.

Graph Matching Multiple Object Tracking +1

111

Paper
Code

Graphics Capsule: Learning Hierarchical 3D Face Representations from 2D Images

no code implementations • CVPR 2023 • Chang Yu, Xiangyu Zhu, Xiaomei Zhang, Zhaoxiang Zhang, Zhen Lei

The function of constructing the hierarchy of objects is important to the visual process of the human brain.

Face Recognition

Paper
Add Code

Sharpness-Aware Gradient Matching for Domain Generalization

1 code implementation • CVPR 2023 • Pengfei Wang, Zhaoxiang Zhang, Zhen Lei, Lei Zhang

In this paper, we present two conditions to ensure that the model could converge to a flat minimum with a small loss, and present an algorithm, named Sharpness-Aware Gradient Matching (SAGM), to meet the two conditions for improving model generalization capability.

Domain Generalization

Paper
Code

A Survey of Deep Visual Cross-Domain Few-Shot Learning

no code implementations • 16 Mar 2023 • Wenjian Wang, Lijuan Duan, Yuxi Wang, Junsong Fan, Zhi Gong, Zhaoxiang Zhang

Research into Cross-Domain Few-Shot (CDFS) has emerged to address this issue, forming a more challenging and realistic setting.

cross-domain few-shot learning Transfer Learning

Paper
Add Code

Blind Video Deflickering by Neural Filtering with a Flawed Atlas

1 code implementation • CVPR 2023 • Chenyang Lei, Xuanchi Ren, Zhaoxiang Zhang, Qifeng Chen

Prior work usually requires specific guidance such as the flickering frequency, manual annotations, or extra consistent videos to remove the flicker.

Video Generation Video Temporal Consistency

650

Paper
Code

LMR: A Large-Scale Multi-Reference Dataset for Reference-based Super-Resolution

1 code implementation • ICCV 2023 • Lin Zhang, Xin Li, Dongliang He, Errui Ding, Zhaoxiang Zhang

To this end, we construct a large-scale, multi-reference super-resolution dataset, named LMR.

feature selection Image Super-Resolution +1

Paper
Code

Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models

no code implementations • CVPR 2023 • Qu Tang, Xiangyu Zhu, Zhen Lei, Zhaoxiang Zhang

The ability to discover abstract physical concepts and understand how they work in the world through observing lies at the core of human intelligence.

Paper
Add Code

Fairly Adaptive Negative Sampling for Recommendations

no code implementations • 16 Feb 2023 • Xiao Chen, Wenqi Fan, Jingfan Chen, Haochen Liu, Zitao Liu, Zhaoxiang Zhang, Qing Li

Pairwise learning strategies are prevalent for optimizing recommendation models on implicit feedback data, which usually learns user preference by discriminating between positive (i. e., clicked by a user) and negative items (i. e., obtained by negative sampling).

Attribute Fairness

Paper
Add Code

FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection

1 code implementation • CVPR 2023 • Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang

The transformation of features from 2D perspective space to 3D space is essential to multi-view 3D object detection.

3D Object Detection object-detection

Paper
Code

Super Sparse 3D Object Detection

2 code implementations • 5 Jan 2023 • Lue Fan, Yuxue Yang, Feng Wang, Naiyan Wang, Zhaoxiang Zhang

To enable efficient long-range detection, we first propose a fully sparse object detector termed FSD.

3D Object Detection Autonomous Driving +2

737

Paper
Code

FPR: False Positive Rectification for Weakly Supervised Semantic Segmentation

1 code implementation • ICCV 2023 • Liyi Chen, Chenyang Lei, Ruihuang Li, Shuai Li, Zhaoxiang Zhang, Lei Zhang

Without introducing any external supervision and human priors, the proposed FPR effectively suppresses wrong activations from the background objects.

Weakly supervised Semantic Segmentation Weakly-Supervised Semantic Segmentation

Paper
Code

BAEFormer: Bi-Directional and Early Interaction Transformers for Bird's Eye View Semantic Segmentation

no code implementations • CVPR 2023 • Cong Pan, Yonghao He, Junran Peng, Qian Zhang, Wei Sui, Zhaoxiang Zhang

Moreover, we find that the image feature maps' resolution in the cross-attention module has a limited effect on the final performance.

Ranked #6 on Bird's-Eye View Semantic Segmentation on nuScenes

Autonomous Driving Bird's-Eye View Semantic Segmentation +1

Paper
Add Code

SSF: Accelerating Training of Spiking Neural Networks with Stabilized Spiking Flow

no code implementations • ICCV 2023 • Jingtao Wang, Zengjie Song, Yuxi Wang, Jun Xiao, Yuran Yang, Shuqi Mei, Zhaoxiang Zhang

Surrogate gradient (SG) is one of the most effective approaches for training spiking neural networks (SNNs).

Paper
Add Code

Extracting Semantic Knowledge from GANs with Unsupervised Learning

no code implementations • 30 Nov 2022 • Jianjin Xu, Zhaoxiang Zhang, Xiaolin Hu

Second, we train image-to-image translation networks on the synthesized datasets, enabling semantic-conditional image synthesis without human annotations.

Image Segmentation Image-to-Image Translation +2

Paper
Add Code

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

2 code implementations • CVPR 2023 • Chenyu Yang, Yuntao Chen, Hao Tian, Chenxin Tao, Xizhou Zhu, Zhaoxiang Zhang, Gao Huang, Hongyang Li, Yu Qiao, Lewei Lu, Jie zhou, Jifeng Dai

The proposed method is verified with a wide spectrum of traditional and modern image backbones and achieves new SoTA results on the large-scale nuScenes dataset.

Ranked #5 on 3D Object Detection on Rope3D

3D Object Detection

2,866

Paper
Code

RRSR:Reciprocal Reference-based Image Super-Resolution with Progressive Feature Alignment and Selection

no code implementations • 8 Nov 2022 • Lin Zhang, Xin Li, Dongliang He, Fu Li, Yili Wang, Zhaoxiang Zhang

While previous state-of-the-art RefSR methods mainly focus on improving the efficacy and robustness of reference feature transfer, it is generally overlooked that a well reconstructed SR image should enable better SR reconstruction for its similar LR images when it is referred to as.

feature selection Image Super-Resolution

Paper
Add Code

Pointly-Supervised Panoptic Segmentation

1 code implementation • 25 Oct 2022 • Junsong Fan, Zhaoxiang Zhang, Tieniu Tan

In this paper, we propose a new approach to applying point-level annotations for weakly-supervised panoptic segmentation.

Panoptic Segmentation Segmentation +3

Paper
Code

4D Unsupervised Object Discovery

1 code implementation • 10 Oct 2022 • Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang

In this paper, we propose 4D unsupervised object discovery, jointly discovering objects from 4D data -- 3D point clouds and 2D RGB images with temporal information.

3D Instance Segmentation Object +4

Paper
Code

MemoNav: Selecting Informative Memories for Visual Navigation

no code implementations • 20 Aug 2022 • Hongxin Li, Xu Yang, Yuran Yang, Shuqi Mei, Zhaoxiang Zhang

To address this limitation, we present the MemoNav, a novel memory mechanism for image-goal navigation, which retains the agent's informative short-term memory and long-term memory to improve the navigation performance on a multi-goal task.

Action Generation Graph Attention +2

Paper
Add Code

Pro-tuning: Unified Prompt Tuning for Vision Tasks

no code implementations • 28 Jul 2022 • Xing Nie, Bolin Ni, Jianlong Chang, Gaomeng Meng, Chunlei Huo, Zhaoxiang Zhang, Shiming Xiang, Qi Tian, Chunhong Pan

To this end, we propose parameter-efficient Prompt tuning (Pro-tuning) to adapt frozen vision models to various downstream vision tasks.

Adversarial Robustness Image Classification +4

Paper
Add Code

Fully Sparse 3D Object Detection

4 code implementations • 20 Jul 2022 • Lue Fan, Feng Wang, Naiyan Wang, Zhaoxiang Zhang

To enable efficient long-range LiDAR-based object detection, we build a fully sparse 3D object detector (FSD).

3D Object Detection Autonomous Driving +1

737

Paper
Code

Densely Constrained Depth Estimator for Monocular 3D Object Detection

1 code implementation • 20 Jul 2022 • Yingyan Li, Yuntao Chen, JiaWei He, Zhaoxiang Zhang

So these methods only use a small number of projection constraints and produce insufficient depth candidates, leading to inaccurate depth estimation.

Depth Estimation Graph Matching +3

Paper
Code

Implicit Sample Extension for Unsupervised Person Re-Identification

1 code implementation • CVPR 2022 • Xinyu Zhang, Dongdong Li, Zhigang Wang, Jian Wang, Errui Ding, Javen Qinfeng Shi, Zhaoxiang Zhang, Jingdong Wang

Specifically, we generate support samples from actual samples and their neighbouring clusters in the embedding space through a progressive linear interpolation (PLI) strategy.

Clustering Unsupervised Person Re-Identification

5,253

Paper
Code

Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

1 code implementation • CVPR 2022 • Zengjie Song, Yuxi Wang, Junsong Fan, Tieniu Tan, Zhaoxiang Zhang

Sound source localization in visual scenes aims to localize objects emitting the sound in a given image.

Contrastive Learning

Paper
Code

Sparse Instance Activation for Real-Time Instance Segmentation

2 code implementations • CVPR 2022 • Tianheng Cheng, Xinggang Wang, Shaoyu Chen, Wenqiang Zhang, Qian Zhang, Chang Huang, Zhaoxiang Zhang, Wenyu Liu

In this paper, we propose a conceptually novel, efficient, and fully convolutional framework for real-time instance segmentation.

Ranked #8 on Real-time Instance Segmentation on MSCOCO

Object object-detection +4

561

Paper
Code

HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network

no code implementations • CVPR 2022 • Chang Yu, Xiangyu Zhu, Xiaomei Zhang, Zidu Wang, Zhaoxiang Zhang, Zhen Lei

Capsule networks are designed to present the objects by a set of parts and their relationships, which provide an insight into the procedure of visual perception.

Paper
Add Code

DATA: Domain-Aware and Task-Aware Self-supervised Learning

1 code implementation • CVPR 2022 • Qing Chang, Junran Peng, Lingxie Xie, Jiajun Sun, Haoran Yin, Qi Tian, Zhaoxiang Zhang

However, due to the high training costs and the unconsciousness of downstream usages, most self-supervised learning methods lack the capability to correspond to the diversities of downstream scenarios, as there are various data domains, different vision tasks and latency constraints on models.

Image Classification Model Selection +5

Paper
Code

The Devil Is in the Details: Window-based Attention for Image Compression

2 code implementations • CVPR 2022 • Renjie Zou, Chunfeng Song, Zhaoxiang Zhang

Inspired by recent progresses of Vision Transformer (ViT) and Swin Transformer, we found that combining the local-aware attention mechanism with the global-related feature learning could meet the expectation in image compression.

Ranked #1 on Image Compression on kodak

Image Compression

146

Paper
Code

Emergence of Machine Language: Towards Symbolic Intelligence with Neural Networks

no code implementations • 14 Jan 2022 • Yuqi Wang, Xu-Yao Zhang, Cheng-Lin Liu, Zhaoxiang Zhang

Moreover, through experiments we show that discrete language representation has several advantages compared with continuous feature representation, from the aspects of interpretability, generalization, and robustness.

Paper
Add Code

Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

no code implementations • CVPR 2022 • Wenjian Wang, Lijuan Duan, Yuxi Wang, Qing En, Junsong Fan, Zhaoxiang Zhang

To remedy this problem, we propose an interesting and challenging cross-domain few-shot semantic segmentation task, where the training and test tasks perform on different domains.

Contrastive Learning Cross-Domain Few-Shot +3

Paper
Add Code

Continual Stereo Matching of Continuous Driving Scenes With Growing Architecture

1 code implementation • CVPR 2022 • Chenghao Zhang, Kun Tian, Bin Fan, Gaofeng Meng, Zhaoxiang Zhang, Chunhong Pan

The deep stereo models have achieved state-of-the-art performance on driving scenes, but they suffer from severe performance degradation when tested on unseen scenes.

Continual Learning Stereo Matching

Paper
Code

Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation

no code implementations • CVPR 2022 • Jing Li, Junsong Fan, Zhaoxiang Zhang

Existing methods usually generate pseudo labels from class activation map (CAM) and then train a segmentation model.

Object Pseudo Label +3

Paper
Add Code

Embracing Single Stride 3D Object Detector with Sparse Transformer

2 code implementations • CVPR 2022 • Lue Fan, Ziqi Pang, Tianyuan Zhang, Yu-Xiong Wang, Hang Zhao, Feng Wang, Naiyan Wang, Zhaoxiang Zhang

In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases.

Ranked #3 on 3D Object Detection on waymo cyclist

3D Object Detection Autonomous Driving +3

737

Paper
Code

Immortal Tracker: Tracklet Never Dies

1 code implementation • 26 Nov 2021 • Qitai Wang, Yuntao Chen, Ziqi Pang, Naiyan Wang, Zhaoxiang Zhang

We employ a simple Kalman filter for trajectory prediction and preserve the tracklet by prediction when the target is not visible.

3D Multi-Object Tracking Trajectory Prediction

106

Paper
Code

OBJECT DYNAMICS DISTILLATION FOR SCENE DECOMPOSITION AND REPRESENTATION

no code implementations • ICLR 2022 • Qu Tang, Xiangyu Zhu, Zhen Lei, Zhaoxiang Zhang

In this paper, we work on object dynamics and propose Object Dynamics Distillation Network (ODDN), a framework that distillates explicit object dynamics (e. g., velocity) from sequential static representations.

Object Predict Future Video Frames +1

Paper
Add Code

Source Data-Free Cross-Domain Semantic Segmentation: Align, Teach and Propagate

no code implementations • 22 Jun 2021 • Yuxi Wang, Jian Liang, Zhaoxiang Zhang

It is the first work to use negative pseudo labels during self-training for domain adaptation.

Domain Adaptation Representation Learning +2

Paper
Add Code

GAIA: A Transfer Learning System of Object Detection that Fits Your Needs

1 code implementation • CVPR 2021 • Xingyuan Bu, Junran Peng, Junjie Yan, Tieniu Tan, Zhaoxiang Zhang

Transfer learning with pre-training on large-scale datasets has played an increasingly significant role in computer vision and natural language processing recently.

object-detection Object Detection +1

184

Paper
Code

Image Inpainting by End-to-End Cascaded Refinement with Mask Awareness

1 code implementation • 28 Apr 2021 • Manyu Zhu, Dongliang He, Xin Li, Chao Li, Fu Li, Xiao Liu, Errui Ding, Zhaoxiang Zhang

Inpainting arbitrary missing regions is challenging because learning valid features for various masked regions is nontrivial.

Ranked #4 on Image Inpainting on CelebA-HQ

Image Inpainting valid

Paper
Code

Distractor-Aware Fast Tracking via Dynamic Convolutions and MOT Philosophy

1 code implementation • CVPR 2021 • Zikai Zhang, Bineng Zhong, Shengping Zhang, Zhenjun Tang, Xin Liu, Zhaoxiang Zhang

A practical long-term tracker typically contains three key properties, i. e. an efficient model design, an effective global re-detection strategy and a robust distractor awareness mechanism.

Multiple Object Tracking Philosophy

Paper
Code

RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features

1 code implementation • CVPR 2021 • Gang Zhang, Xin Lu, Jingru Tan, Jianmin Li, Zhaoxiang Zhang, Quanquan Li, Xiaolin Hu

In this work, we propose a new method called RefineMask for high-quality instance segmentation of objects and scenes, which incorporates fine-grained features during the instance-wise segmenting process in a multi-stage manner.

Instance Segmentation Semantic Segmentation +1

210

Paper
Code

Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation

2 code implementations • CVPR 2021 • Chufeng Tang, Hang Chen, Xiao Li, Jianmin Li, Zhaoxiang Zhang, Xiaolin Hu

Tremendous efforts have been made on instance segmentation but the mask quality is still not satisfactory.

Instance Segmentation Segmentation +1

170

Paper
Code

Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression

2 code implementations • CVPR 2021 • Zigang Geng, Ke Sun, Bin Xiao, Zhaoxiang Zhang, Jingdong Wang

Our motivation is that regressing keypoint positions accurately needs to learn representations that focus on the keypoint regions.

Keypoint Detection

4,986

Paper
Code

Learnable Graph Matching: Incorporating Graph Partitioning with Deep Feature Learning for Multiple Object Tracking

1 code implementation • CVPR 2021 • JiaWei He, Zehao Huang, Naiyan Wang, Zhaoxiang Zhang

Then the association problem turns into a general graph matching between tracklet graph and detection graph.

Graph Matching graph partitioning +3

111

Paper
Code

RangeDet:In Defense of Range View for LiDAR-based 3D Object Detection

1 code implementation • 18 Mar 2021 • Lue Fan, Xuan Xiong, Feng Wang, Naiyan Wang, Zhaoxiang Zhang

The most notable difference with previous works is that our method is purely based on the range view representation.

3D Object Detection object-detection +2

183

Paper
Code

Clothing Status Awareness for Long-Term Person Re-Identification

no code implementations • ICCV 2021 • Yan Huang, Qiang Wu, Jingsong Xu, Yi Zhong, Zhaoxiang Zhang

This work argues that these approaches in fact are not aware of clothing status (i. e., change or no-change) of a pedestrian.

Person Re-Identification

Paper
Add Code

RangeDet: In Defense of Range View for LiDAR-Based 3D Object Detection

1 code implementation • ICCV 2021 • Lue Fan, Xuan Xiong, Feng Wang, Naiyan Wang, Zhaoxiang Zhang

We first analyze the existing range-view-based methods and find two issues overlooked by previous works: 1) the scale variation between nearby and far away objects; 2) the inconsistency between the 2D range image coordinates used in feature extraction and the 3D Cartesian coordinates used in output.

3D Object Detection object-detection +2

183

Paper
Code

Uncertainty-Aware Pseudo Label Refinery for Domain Adaptive Semantic Segmentation

no code implementations • ICCV 2021 • Yuxi Wang, Junran Peng, Zhaoxiang Zhang

Unsupervised domain adaptation for semantic segmentation aims to assign the pixel-level labels for unlabeled target domain by transferring knowledge from the labeled source domain.

Pseudo Label Self-Supervised Learning +2

Paper
Add Code

Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation

1 code implementation • 9 Dec 2020 • Xueyi Li, Tianfei Zhou, Jianwu Li, Yi Zhou, Zhaoxiang Zhang

We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths, which can be used for training more accurate segmentation models.

Ranked #37 on Weakly-Supervised Semantic Segmentation on COCO 2014 val (using extra training data)

Segmentation Structured Prediction +2

Paper
Code

Unsupervised Object Detection with LiDAR Clues

no code implementations • CVPR 2021 • Hao Tian, Yuntao Chen, Jifeng Dai, Zhaoxiang Zhang, Xizhou Zhu

We further identify another major issue, seldom noticed by the community, that the long-tailed and open-ended (sub-)category distribution should be accommodated.

Object object-detection +2

Paper
Add Code

Manual-Label Free 3D Detection via An Open-Source Simulator

no code implementations • 16 Nov 2020 • Zhen Yang, Chi Zhang, Huiming Guo, Zhaoxiang Zhang

In this paper, we propose a manual-label free 3D detection algorithm that leverages the CARLA simulator to generate a large amount of self-labeled training samples and introduces a novel Domain Adaptive VoxelNet (DA-VoxelNet) that can cross the distribution gap from the synthetic data to the real scenario.

Paper
Add Code

Multi-task Layout Analysis for Historical Handwritten Documents Using Fully Convolutional Networks

no code implementations • International Joint Conference on Artificial Intelligence 2018 • Yue Xu, Fei Yin, Zhaoxiang Zhang, Cheng-Lin Liu

Layout analysis is a fundamental process in document image analysis and understanding.

Paper
Add Code

GIFT: A Real-time and Scalable 3D Shape Search Engine

no code implementations • CVPR 2016 • Song Bai, Xiang Bai, Zhichao Zhou, Zhaoxiang Zhang, Longin Jan Latecki

We name the proposed 3D shape search engine, which combines GPU acceleration and Inverted File Twice, as GIFT.

3D Shape Classification 3D Shape Retrieval +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.