MVFusion: Multi-View 3D Object Detection with Semantic-aligned Radar and Camera Fusion

21 Feb 2023  ·  Zizhang Wu, Guilian Chen, Yuanzhu Gan, Lei Wang, Jian Pu ·

Multi-view radar-camera fused 3D object detection provides a farther detection range and more helpful features for autonomous driving, especially under adverse weather. The current radar-camera fusion methods deliver kinds of designs to fuse radar information with camera data. However, these fusion approaches usually adopt the straightforward concatenation operation between multi-modal features, which ignores the semantic alignment with radar features and sufficient correlations across modals. In this paper, we present MVFusion, a novel Multi-View radar-camera Fusion method to achieve semantic-aligned radar features and enhance the cross-modal information interaction. To achieve so, we inject the semantic alignment into the radar features via the semantic-aligned radar encoder (SARE) to produce image-guided radar features. Then, we propose the radar-guided fusion transformer (RGFT) to fuse our radar and image features to strengthen the two modals' correlation from the global scope via the cross-attention mechanism. Extensive experiments show that MVFusion achieves state-of-the-art performance (51.7% NDS and 45.3% mAP) on the nuScenes dataset. We shall release our code and trained networks upon publication.

PDF Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
3D Object Detection nuscenes Camera-Radar MVFusion NDS 51.7 # 7

Methods


No methods listed for this paper. Add relevant methods here