DSGN: Deep Stereo Geometry Network for 3D Object Detection

CVPR 2020  ·  Yilun Chen, Shu Liu, Xiaoyong Shen, Jiaya Jia ·

Most state-of-the-art 3D object detectors heavily rely on LiDAR sensors because there is a large performance gap between image-based and LiDAR-based methods. It is caused by the way to form representation for the prediction in 3D scenarios. Our method, called Deep Stereo Geometry Network (DSGN), significantly reduces this gap by detecting 3D objects on a differentiable volumetric representation -- 3D geometric volume, which effectively encodes 3D geometric structure for 3D regular space. With this representation, we learn depth information and semantic cues simultaneously. For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline that jointly estimates the depth and detects 3D objects in an end-to-end learning manner. Our approach outperforms previous stereo-based 3D detectors (about 10 higher in terms of AP) and even achieves comparable performance with several LiDAR-based methods on the KITTI 3D object detection leaderboard. Our code is publicly available at https://github.com/chenyilun95/DSGN.

PDF Abstract CVPR 2020 PDF CVPR 2020 Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Vehicle Pose Estimation KITTI Cars Hard DSGN (Stereo) Average Orientation Similarity 78.27 # 5
3D Object Detection From Stereo Images KITTI Cars Moderate DSGN AP75 52.18 # 5
3D Object Detection From Stereo Images KITTI Cyclists Moderate DSGN AP50 18.17 # 4
3D Object Detection From Stereo Images KITTI Pedestrians Moderate DSGN AP50 15.55 # 6

Methods