Object Recognition
486 papers with code • 7 benchmarks • 42 datasets
Object recognition is a computer vision technique for detecting + classifying objects in images or videos. Since this is a combined task of object detection plus image classification, the state-of-the-art tables are recorded for each component task here and here.
( Image credit: Tensorflow Object Detection API )
Libraries
Use these libraries to find Object Recognition models and implementationsDatasets
Latest papers
Exploring the Transferability of Visual Prompting for Multimodal Large Language Models
To achieve this, we propose Transferable Visual Prompting (TVP), a simple and effective approach to generate visual prompts that can transfer to different models and improve their performance on downstream tasks after trained on only one model.
MindSet: Vision. A toolbox for testing DNNs on key psychological experiments
Multiple benchmarks have been developed to assess the alignment between deep neural networks (DNNs) and human vision.
Is CLIP the main roadblock for fine-grained open-world perception?
Modern applications increasingly demand flexible computer vision models that adapt to novel concepts not encountered during training.
One Noise to Rule Them All: Multi-View Adversarial Attacks with Universal Perturbation
This paper presents a novel universal perturbation method for generating robust multi-view adversarial examples in 3D object recognition.
ParFormer: Vision Transformer Baseline with Parallel Local Global Token Mixer and Convolution Attention Patch Embedding
The ParFormer models outperformed ConvNeXt and Swin Transformer for the pure convolution and transformer model in accuracy.
Lifting Multi-View Detection and Tracking to the Bird's Eye View
Taking advantage of multi-view aggregation presents a promising solution to tackle challenges such as occlusion and missed detection in multi-object tracking and detection.
EventRPG: Event Data Augmentation with Relevance Propagation Guidance
Based on this, we propose EventRPG, which leverages relevance propagation on the spiking neural network for more efficient augmentation.
Don't Judge by the Look: Towards Motion Coherent Video Representation
Current training pipelines in object recognition neglect Hue Jittering when doing data augmentation as it not only brings appearance changes that are detrimental to classification, but also the implementation is inefficient in practice.
MARVIS: Motion & Geometry Aware Real and Virtual Image Segmentation
By creating realistic synthetic images that mimic the complexities of the water surface, we provide fine-grained training data for our network (MARVIS) to discern between real and virtual images effectively.
Probing Multimodal Large Language Models for Global and Local Semantic Representations
The advancement of Multimodal Large Language Models (MLLMs) has greatly accelerated the development of applications in understanding integrated texts and images.