Object Detection

3722 papers with code • 91 benchmarks • 262 datasets

Object Detection is a computer vision task in which the goal is to detect and locate objects of interest in an image or video. The task involves identifying the position and boundaries of objects in an image, and classifying the objects into different categories. It forms a crucial part of vision recognition, alongside image classification and retrieval.

The state-of-the-art methods can be categorized into two main types: one-stage methods and two stage-methods:

  • One-stage methods prioritize inference speed, and example models include YOLO, SSD and RetinaNet.

  • Two-stage methods prioritize detection accuracy, and example models include Faster R-CNN, Mask R-CNN and Cascade R-CNN.

The most popular benchmark is the MSCOCO dataset. Models are typically evaluated according to a Mean Average Precision metric.

( Image credit: Detectron )

Libraries

Use these libraries to find Object Detection models and implementations
64 papers
27,845
20 papers
2,918
See all 40 libraries.

Most implemented papers

Deep High-Resolution Representation Learning for Visual Recognition

open-mmlab/mmdetection 20 Aug 2019

High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection.

YOLOX: Exceeding YOLO Series in 2021

Megvii-BaseDetection/YOLOX 18 Jul 2021

In this report, we present some experienced improvements to YOLO series, forming a new high-performance detector -- YOLOX.

Deep High-Resolution Representation Learning for Human Pose Estimation

leoxiaobin/deep-high-resolution-net.pytorch CVPR 2019

We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel.

High-Resolution Representations for Labeling Pixels and Regions

leoxiaobin/deep-high-resolution-net.pytorch 9 Apr 2019

The proposed approach achieves superior results to existing single-model networks on COCO object detection.

Deformable Convolutional Networks

msracver/Deformable-ConvNets ICCV 2017

Convolutional neural networks (CNNs) are inherently limited to model geometric transformations due to the fixed geometric structures in its building modules.

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

tensorpack/tensorpack CVPR 2018

We introduce an extremely computation-efficient CNN architecture named ShuffleNet, which is designed specially for mobile devices with very limited computing power (e. g., 10-150 MFLOPs).

End-to-End Object Detection with Transformers

facebookresearch/detr ECCV 2020

We present a new method that views object detection as a direct set prediction problem.

Spatial Memory for Context Reasoning in Object Detection

endernewton/tf-faster-rcnn ICCV 2017

On the other hand, modeling object-object relationships requires {\bf spatial} reasoning -- not only do we need a memory to store the spatial layout, but also a effective reasoning module to extract spatial patterns.

ResNeSt: Split-Attention Networks

zhanghang1989/ResNeSt 19 Apr 2020

It is well known that featuremap attention and multi-path representation are important for visual recognition.