Multi-Level Fusion Based 3D Object Detection From Monocular Images

CVPR 2018  ·  Bin Xu, Zhenzhong Chen ·

In this paper, we present an end-to-end deep learning based framework for 3D object detection from a single monocular image. A deep convolutional neural network is introduced for simultaneous 2D and 3D object detection. First, 2D region proposals are generated through a region proposal network. Then the shared features are learned within the proposals to predict the class probability, 2D bounding box, orientation, dimension, and 3D location. We adopt a stand-alone module to predict the disparity and extract features from the computed point cloud. Thus features from the original image and the point cloud will be fused in different levels for accurate 3D localization. The estimated disparity is also used for front view feature encoding to enhance the input image,regarded as an input-fusionprocess. The proposed algorithm can directly output both 2D and 3D object detection results in an end-to-end fashion with only a single RGB image as the input. The experimental results on the challenging KITTI benchmark demonstrate that our algorithm significantly outperforms the state-of-the-art methods with only monocular images.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Vehicle Pose Estimation KITTI Cars Hard ML-Fusion Average Orientation Similarity 76.37 # 12

Methods


No methods listed for this paper. Add relevant methods here