VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection

CVPR 2018  ยท  Yin Zhou, Oncel Tuzel ยท

Accurate detection of objects in 3D point clouds is a central problem in many applications, such as autonomous navigation, housekeeping robots, and augmented/virtual reality. To interface a highly sparse LiDAR point cloud with a region proposal network (RPN), most existing efforts have focused on hand-crafted feature representations, for example, a bird's eye view projection. In this work, we remove the need of manual feature engineering for 3D point clouds and propose VoxelNet, a generic 3D detection network that unifies feature extraction and bounding box prediction into a single stage, end-to-end trainable deep network. Specifically, VoxelNet divides a point cloud into equally spaced 3D voxels and transforms a group of points within each voxel into a unified feature representation through the newly introduced voxel feature encoding (VFE) layer. In this way, the point cloud is encoded as a descriptive volumetric representation, which is then connected to a RPN to generate detections. Experiments on the KITTI car detection benchmark show that VoxelNet outperforms the state-of-the-art LiDAR based 3D detection methods by a large margin. Furthermore, our network learns an effective discriminative representation of objects with various geometries, leading to encouraging results in 3D detection of pedestrians and cyclists, based on only LiDAR.

PDF Abstract CVPR 2018 PDF CVPR 2018 Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Birds Eye View Object Detection KITTI Cars Easy VoxelNet AP 89.35% # 6
3D Object Detection KITTI Cars Easy VoxelNet AP 77.47% # 24
Object Localization KITTI Cars Easy VoxelNet AP 89.35% # 1
Birds Eye View Object Detection KITTI Cars Easy val VoxelNet AP 89.6 # 1
Object Localization KITTI Cars Hard VoxelNet AP 77.39% # 1
3D Object Detection KITTI Cars Hard VoxelNet AP 57.73% # 23
Birds Eye View Object Detection KITTI Cars Hard VoxelNet AP 77.39 # 7
Birds Eye View Object Detection KITTI Cars Hard val VoxelNet AP 78.57 # 1
Birds Eye View Object Detection KITTI Cars Moderate VoxelNet AP 79.26% # 8
Object Localization KITTI Cars Moderate VoxelNet AP 79.26% # 2
3D Object Detection KITTI Cars Moderate VoxelNet AP 65.11% # 29
Birds Eye View Object Detection KITTI Cars Moderate val VoxelNet AP 84.81 # 1
Birds Eye View Object Detection KITTI Cyclist Easy val VoxelNet AP 74.41 # 1
Birds Eye View Object Detection KITTI Cyclist Hard val VoxelNet AP 50.49 # 1
Birds Eye View Object Detection KITTI Cyclist Moderate val VoxelNet AP 52.18 # 1
3D Object Detection KITTI Cyclists Easy VoxelNet AP 61.22% # 12
Object Localization KITTI Cyclists Easy VoxelNet AP 66.7% # 2
3D Object Detection KITTI Cyclists Hard VoxelNet AP 44.37% # 12
Object Localization KITTI Cyclists Hard VoxelNe AP 50.55% # 2
3D Object Detection KITTI Cyclists Moderate VoxelNet AP 48.36% # 12
Object Localization KITTI Cyclists Moderate VoxelNet AP 54.76% # 2
Birds Eye View Object Detection KITTI Cyclists Moderate VoxelNet AP 54.76% # 6
Birds Eye View Object Detection KITTI Pedestrian Easy val VoxelNet AP 65.95 # 1
Birds Eye View Object Detection KITTI Pedestrian Hard val VoxelNet AP 56.98 # 1
Birds Eye View Object Detection KITTI Pedestrian Moderate val VoxelNet AP 61.05 # 1
3D Object Detection KITTI Pedestrians Easy VoxelNet AP 39.48% # 9
Object Localization KITTI Pedestrians Easy VoxelNet AP 46.13% # 2
Object Localization KITTI Pedestrians Hard VoxelNet AP 38.11% # 3
3D Object Detection KITTI Pedestrians Hard VoxelNet AP 31.51% # 9
3D Object Detection KITTI Pedestrians Moderate VoxelNet AP 33.69% # 12
Birds Eye View Object Detection KITTI Pedestrians Moderate VoxelNet AP 40.74% # 6
Object Localization KITTI Pedestrians Moderate VoxelNet AP 40.74% # 3

Methods