Object Recognition
486 papers with code • 7 benchmarks • 42 datasets
Object recognition is a computer vision technique for detecting + classifying objects in images or videos. Since this is a combined task of object detection plus image classification, the state-of-the-art tables are recorded for each component task here and here.
( Image credit: Tensorflow Object Detection API )
Libraries
Use these libraries to find Object Recognition models and implementationsDatasets
Latest papers
Probing Multimodal Large Language Models for Global and Local Semantic Representations
The advancement of Multimodal Large Language Models (MLLMs) has greatly accelerated the development of applications in understanding integrated texts and images.
CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models
Recent years have witnessed a significant increase in the performance of Vision and Language tasks.
SHIELD : An Evaluation Benchmark for Face Spoofing and Forgery Detection with Multimodal Large Language Models
For the face forgery detection task, we evaluate GAN-based and diffusion-based data with both visual and acoustic modalities.
Lightweight Pixel Difference Networks for Efficient Visual Representation Learning
With PDC and Bi-PDC, we further present two lightweight deep networks named \emph{Pixel Difference Networks (PiDiNet)} and \emph{Binary PiDiNet (Bi-PiDiNet)} respectively to learn highly efficient yet more accurate representations for visual tasks including edge detection and object recognition.
Self-supervised learning of video representations from a child's perspective
These results suggest that important temporal aspects of a child's internal model of the world may be learnable from their visual experience using highly generic learning algorithms and without strong inductive biases.
Local Feature Matching Using Deep Learning: A Survey
The objective of this endeavor is to furnish a comprehensive overview of local feature matching methods.
pix2gestalt: Amodal Segmentation by Synthesizing Wholes
We introduce pix2gestalt, a framework for zero-shot amodal segmentation, which learns to estimate the shape and appearance of whole objects that are only partially visible behind occlusions.
ContextMix: A context-aware data augmentation method for industrial visual inspection systems
With the minimal additional computation cost of image resizing, ContextMix enhances performance compared to existing augmentation techniques.
Seeing the roads through the trees: A benchmark for modeling spatial dependencies with aerial imagery
In this work we propose a road segmentation benchmark dataset, Chesapeake Roads Spatial Context (RSC), for evaluating the spatial long-range context understanding of geospatial machine learning models and show how commonly used semantic segmentation models can fail at this task.
CLIP-guided Federated Learning on Heterogeneous and Long-Tailed Data
For server-side learning, in order to mitigate the heterogeneity and class-distribution imbalance, we generate federated features to retrain the server model.