An End-to-End Transformer Model for 3D Object Detection

ICCV 2021  ·  Ishan Misra, Rohit Girdhar, Armand Joulin ·

We propose 3DETR, an end-to-end Transformer based object detection model for 3D point clouds. Compared to existing detection methods that employ a number of 3D-specific inductive biases, 3DETR requires minimal modifications to the vanilla Transformer block. Specifically, we find that a standard Transformer with non-parametric queries and Fourier positional embeddings is competitive with specialized architectures that employ libraries of 3D-specific operators with hand-tuned hyperparameters. Nevertheless, 3DETR is conceptually simple and easy to implement, enabling further improvements by incorporating 3D domain knowledge. Through extensive experiments, we show 3DETR outperforms the well-established and highly optimized VoteNet baselines on the challenging ScanNetV2 dataset by 9.5%. Furthermore, we show 3DETR is applicable to 3D tasks beyond detection, and can serve as a building block for future research.

PDF Abstract ICCV 2021 PDF ICCV 2021 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
3D Object Detection ScanNetV2 3DETR-m mAP@0.25 65.0 # 18
mAP@0.5 47.0 # 19
3D Object Detection SUN-RGBD val 3DETR-m mAP@0.25 59.1 # 20
mAP@0.5 32.7 # 20

Methods