Object Detection
3692 papers with code • 84 benchmarks • 256 datasets
Object Detection is a computer vision task in which the goal is to detect and locate objects of interest in an image or video. The task involves identifying the position and boundaries of objects in an image, and classifying the objects into different categories. It forms a crucial part of vision recognition, alongside image classification and retrieval.
The state-of-the-art methods can be categorized into two main types: one-stage methods and two stage-methods:
-
One-stage methods prioritize inference speed, and example models include YOLO, SSD and RetinaNet.
-
Two-stage methods prioritize detection accuracy, and example models include Faster R-CNN, Mask R-CNN and Cascade R-CNN.
The most popular benchmark is the MSCOCO dataset. Models are typically evaluated according to a Mean Average Precision metric.
( Image credit: Detectron )
Libraries
Use these libraries to find Object Detection models and implementationsDatasets
Subtasks
- 3D Object Detection
- Real-Time Object Detection
- RGB Salient Object Detection
- Few-Shot Object Detection
- Few-Shot Object Detection
- Video Object Detection
- RGB-D Salient Object Detection
- Open Vocabulary Object Detection
- Object Detection In Aerial Images
- Weakly Supervised Object Detection
- Small Object Detection
- Robust Object Detection
- Medical Object Detection
- Zero-Shot Object Detection
- Open World Object Detection
- Co-Salient Object Detection
- Dense Object Detection
- Object Proposal Generation
- Video Salient Object Detection
- Camouflaged Object Segmentation
- License Plate Detection
- Head Detection
- Multiview Detection
- 3D Object Detection From Monocular Images
- One-Shot Object Detection
- Moving Object Detection
- Surgical tool detection
- Described Object Detection
- Body Detection
- Pupil Detection
- Object Detection In Indoor Scenes
- Class-agnostic Object Detection
- Semantic Part Detection
- Object Skeleton Detection
- Fish Detection
- Multiple Affordance Detection
- Weakly Supervised 3D Detection
Most implemented papers
Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
We present an approach to efficiently detect the 2D pose of multiple people in an image.
Searching for MobileNetV3
We achieve new state of the art results for mobile classification, detection and segmentation.
Towards Deep Learning Models Resistant to Adversarial Attacks
Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal.
RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free
COCO test-dev results are up to 41. 4 mAP for RetinaMask-101 vs 39. 1mAP for RetinaNet-101, while the runtime is the same during evaluation.
R-FCN: Object Detection via Region-based Fully Convolutional Networks
In contrast to previous region-based detectors such as Fast/Faster R-CNN that apply a costly per-region subnetwork hundreds of times, our region-based detector is fully convolutional with almost all computation shared on the entire image.
Masked Autoencoders Are Scalable Vision Learners
Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.
An Implementation of Faster RCNN with Study for Region Sampling
We adapted the join-training scheme of Faster RCNN framework from Caffe to TensorFlow as a baseline implementation for object detection.
A ConvNet for the 2020s
The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model.
VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection
Accurate detection of objects in 3D point clouds is a central problem in many applications, such as autonomous navigation, housekeeping robots, and augmented/virtual reality.
Deep High-Resolution Representation Learning for Visual Recognition
High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection.