Object Recognition

486 papers with code • 7 benchmarks • 42 datasets

Object recognition is a computer vision technique for detecting + classifying objects in images or videos. Since this is a combined task of object detection plus image classification, the state-of-the-art tables are recorded for each component task here and here.

( Image credit: Tensorflow Object Detection API )

Benchmarks

Add a Result

These leaderboards are used to track progress in Object Recognition

Dataset	Best Model	Compare
MECCANO	Faster-RCNN	See all
CIFAR10-DVS	SSNN	See all
DVS128 Gesture	SSNN	See all
N-Caltech 101	SSNN	See all
ObjectNet (ImageNet classes, trained on ImageNet)	ObjectNet-Baseline	See all
ObjectNet (ImageNet classes)	ObjectNet-Baseline	See all
ObjectNet (All classes)	ObjectNet-Baseline	See all

Libraries

Use these libraries to find Object Recognition models and implementations

peymanbateni/simple-cnaps

3 papers

110

plai-group/simple-cnaps

3 papers

open-mmlab/mmdetection

2 papers

27,790

pytorch/vision

2 papers

15,438

See all 11 libraries.

Datasets

Subtasks

Latest papers

Most implemented Social Latest No code

Exploring the Transferability of Visual Prompting for Multimodal Large Language Models

zycheiheihei/transferable-visual-prompting • • 17 Apr 2024

To achieve this, we propose Transferable Visual Prompting (TVP), a simple and effective approach to generate visual prompts that can transfer to different models and improve their performance on downstream tasks after trained on only one model.

17 Apr 2024

Paper
Code

MindSet: Vision. A toolbox for testing DNNs on key psychological experiments

faceonlive/ai-research • 8 Apr 2024

Multiple benchmarks have been developed to assess the alignment between deep neural networks (DNNs) and human vision.

152

08 Apr 2024

Paper
Code

Is CLIP the main roadblock for fine-grained open-world perception?

lorebianchi98/fg-ovd • • 4 Apr 2024

Modern applications increasingly demand flexible computer vision models that adapt to novel concepts not encountered during training.

04 Apr 2024

Paper
Code

One Noise to Rule Them All: Multi-View Adversarial Attacks with Universal Perturbation

memoatwit/universalperturbation • • 2 Apr 2024

This paper presents a novel universal perturbation method for generating robust multi-view adversarial examples in 3D object recognition.

02 Apr 2024

Paper
Code

ParFormer: Vision Transformer Baseline with Parallel Local Global Token Mixer and Convolution Attention Patch Embedding

novendrastywn/parformer-cape-2024 • • 22 Mar 2024

The ParFormer models outperformed ConvNeXt and Swin Transformer for the pure convolution and transformer model in accuracy.

22 Mar 2024

Paper
Code

Lifting Multi-View Detection and Tracking to the Bird's Eye View

tteepe/tracktacular • • 19 Mar 2024

Taking advantage of multi-view aggregation presents a promising solution to tackle challenges such as occlusion and missed detection in multi-object tracking and detection.

19 Mar 2024

Paper
Code

EventRPG: Event Data Augmentation with Relevance Propagation Guidance

myuansun/eventrpg • • 14 Mar 2024

Based on this, we propose EventRPG, which leverages relevance propagation on the spiking neural network for more efficient augmentation.

14 Mar 2024

Paper
Code

Don't Judge by the Look: Towards Motion Coherent Video Representation

bespontaneous/mca-pytorch • • 14 Mar 2024

Current training pipelines in object recognition neglect Hue Jittering when doing data augmentation as it not only brings appearance changes that are detrimental to classification, but also the implementation is inefficient in practice.

14 Mar 2024

Paper
Code

MARVIS: Motion & Geometry Aware Real and Virtual Image Segmentation

jiayi-wu-umd/marvis • 14 Mar 2024

By creating realistic synthetic images that mimic the complexities of the water surface, we provide fine-grained training data for our network (MARVIS) to discern between real and virtual images effectively.

14 Mar 2024

Paper
Code

Probing Multimodal Large Language Models for Global and Local Semantic Representations

kobayashikanna01/probing_MLLM_rep • 27 Feb 2024

The advancement of Multimodal Large Language Models (MLLMs) has greatly accelerated the development of applications in understanding integrated texts and images.

27 Feb 2024

Paper
Code

Object Recognition

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result