Monocular 3D Object Detection
77 papers with code • 15 benchmarks • 5 datasets
Monocular 3D Object Detection is the task to draw 3D bounding box around objects in a single 2D RGB image. It is localization task but without any extra information like depth or other sensors or multiple-images.
Libraries
Use these libraries to find Monocular 3D Object Detection models and implementationsLatest papers
Delving into Motion-Aware Matching for Monocular 3D Object Tracking
In this paper, we find that the motion cue of objects along different time frames is critical in 3D multi-object tracking, which is less explored in existing monocular-based approaches.
MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection
To the best of our knowledge, this work is the first to introduce volume rendering for M3D, and demonstrates the potential of implicit reconstruction for image-based 3D perception.
VCVW-3D: A Virtual Construction Vehicles and Workers Dataset with 3D Annotations
Currently, object detection applications in construction are almost based on pure 2D data (both image and annotation are 2D-based), resulting in the developed artificial intelligence (AI) applications only applicable to some scenarios that only require 2D information.
Learning Occupancy for Monocular 3D Object Detection
Monocular 3D detection is a challenging task due to the lack of accurate 3D information.
SSD-MonoDETR: Supervised Scale-aware Deformable Transformer for Monocular 3D Object Detection
To tackle this problem, this paper proposes a novel "Supervised Scale-aware Deformable Attention" (SSDA) for monocular 3D object detection.
SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer
High-resolution images enable neural networks to learn richer visual representations.
3DPPE: 3D Point Positional Encoding for Transformer-based Multi-Camera 3D Object Detection
Although 3D measurements are not available at the inference time of monocular 3D object detection, 3DPPE uses predicted depth to approximate the real point positions.
OBMO: One Bounding Box Multiple Objects for Monocular 3D Object Detection
Unfortunately, the network cannot accurately distinguish different depths from such non-discriminative visual features, resulting in unstable depth training.
3DPPE: 3D Point Positional Encoding for Multi-Camera 3D Object Detection Transformers
Although 3D measurements are not available at the inference time of monocular 3D object detection, 3DPPE uses predicted depth to approximate the real point positions.
Cross-Modality Knowledge Distillation Network for Monocular 3D Object Detection
Leveraging LiDAR-based detectors or real LiDAR point data to guide monocular 3D detection has brought significant improvement, e. g., Pseudo-LiDAR methods.