Object Recognition

486 papers with code • 7 benchmarks • 42 datasets

Object recognition is a computer vision technique for detecting + classifying objects in images or videos. Since this is a combined task of object detection plus image classification, the state-of-the-art tables are recorded for each component task here and here.

( Image credit: Tensorflow Object Detection API )

Libraries

Use these libraries to find Object Recognition models and implementations

Exploring the Transferability of Visual Prompting for Multimodal Large Language Models

zycheiheihei/transferable-visual-prompting 17 Apr 2024

To achieve this, we propose Transferable Visual Prompting (TVP), a simple and effective approach to generate visual prompts that can transfer to different models and improve their performance on downstream tasks after trained on only one model.

9
17 Apr 2024

MindSet: Vision. A toolbox for testing DNNs on key psychological experiments

faceonlive/ai-research 8 Apr 2024

Multiple benchmarks have been developed to assess the alignment between deep neural networks (DNNs) and human vision.

152
08 Apr 2024

Is CLIP the main roadblock for fine-grained open-world perception?

lorebianchi98/fg-ovd 4 Apr 2024

Modern applications increasingly demand flexible computer vision models that adapt to novel concepts not encountered during training.

14
04 Apr 2024

One Noise to Rule Them All: Multi-View Adversarial Attacks with Universal Perturbation

memoatwit/universalperturbation 2 Apr 2024

This paper presents a novel universal perturbation method for generating robust multi-view adversarial examples in 3D object recognition.

0
02 Apr 2024

ParFormer: Vision Transformer Baseline with Parallel Local Global Token Mixer and Convolution Attention Patch Embedding

novendrastywn/parformer-cape-2024 22 Mar 2024

The ParFormer models outperformed ConvNeXt and Swin Transformer for the pure convolution and transformer model in accuracy.

2
22 Mar 2024

Lifting Multi-View Detection and Tracking to the Bird's Eye View

tteepe/tracktacular 19 Mar 2024

Taking advantage of multi-view aggregation presents a promising solution to tackle challenges such as occlusion and missed detection in multi-object tracking and detection.

12
19 Mar 2024

EventRPG: Event Data Augmentation with Relevance Propagation Guidance

myuansun/eventrpg 14 Mar 2024

Based on this, we propose EventRPG, which leverages relevance propagation on the spiking neural network for more efficient augmentation.

6
14 Mar 2024

Don't Judge by the Look: Towards Motion Coherent Video Representation

bespontaneous/mca-pytorch 14 Mar 2024

Current training pipelines in object recognition neglect Hue Jittering when doing data augmentation as it not only brings appearance changes that are detrimental to classification, but also the implementation is inefficient in practice.

4
14 Mar 2024

MARVIS: Motion & Geometry Aware Real and Virtual Image Segmentation

jiayi-wu-umd/marvis 14 Mar 2024

By creating realistic synthetic images that mimic the complexities of the water surface, we provide fine-grained training data for our network (MARVIS) to discern between real and virtual images effectively.

1
14 Mar 2024

Probing Multimodal Large Language Models for Global and Local Semantic Representations

kobayashikanna01/probing_MLLM_rep 27 Feb 2024

The advancement of Multimodal Large Language Models (MLLMs) has greatly accelerated the development of applications in understanding integrated texts and images.

0
27 Feb 2024