Vote3Deep: Fast Object Detection in 3D Point Clouds Using Efficient Convolutional Neural Networks

21 Sep 2016  ·  Martin Engelcke, Dushyant Rao, Dominic Zeng Wang, Chi Hay Tong, Ingmar Posner ·

This paper proposes a computationally efficient approach to detecting objects natively in 3D point clouds using convolutional neural networks (CNNs). In particular, this is achieved by leveraging a feature-centric voting scheme to implement novel convolutional layers which explicitly exploit the sparsity encountered in the input. To this end, we examine the trade-off between accuracy and speed for different architectures and additionally propose to use an L1 penalty on the filter activations to further encourage sparsity in the intermediate representations. To the best of our knowledge, this is the first work to propose sparse convolutional layers and L1 regularisation for efficient large-scale processing of 3D data. We demonstrate the efficacy of our approach on the KITTI object detection benchmark and show that Vote3Deep models with as few as three layers outperform the previous state of the art in both laser and laser-vision based approaches by margins of up to 40% while remaining highly competitive in terms of processing time.

PDF Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Object Detection KITTI Cars Easy Vote3Deep AP 76.79 # 4
Object Detection KITTI Cars Hard Vote3Deep AP 63.23 # 3
Object Detection KITTI Cars Moderate Vote3Deep AP 68.24 # 3
Object Detection KITTI Cyclists Easy Vote3Deep AP 79.92 # 1
Object Detection KITTI Cyclists Hard Vote3Deep AP 62.98 # 1
Object Detection KITTI Cyclists Moderate Vote3Deep AP 67.88 # 1
Object Detection KITTI Pedestrians Easy Vote3Deep AP 68.39 # 1
Object Detection KITTI Pedestrians Hard Vote3Deep AP 52.59 # 1
Object Detection KITTI Pedestrians Moderate Vote3Deep AP 55.37 # 1

Methods