Object Recognition

486 papers with code • 7 benchmarks • 42 datasets

Object recognition is a computer vision technique for detecting + classifying objects in images or videos. Since this is a combined task of object detection plus image classification, the state-of-the-art tables are recorded for each component task here and here.

( Image credit: Tensorflow Object Detection API )

Benchmarks

Add a Result

These leaderboards are used to track progress in Object Recognition

Dataset	Best Model	Compare
MECCANO	Faster-RCNN	See all
CIFAR10-DVS	SSNN	See all
DVS128 Gesture	SSNN	See all
N-Caltech 101	SSNN	See all
ObjectNet (ImageNet classes, trained on ImageNet)	ObjectNet-Baseline	See all
ObjectNet (ImageNet classes)	ObjectNet-Baseline	See all
ObjectNet (All classes)	ObjectNet-Baseline	See all

Libraries

Use these libraries to find Object Recognition models and implementations

peymanbateni/simple-cnaps

3 papers

110

plai-group/simple-cnaps

3 papers

open-mmlab/mmdetection

2 papers

27,878

pytorch/vision

2 papers

15,476

See all 11 libraries.

Datasets

Subtasks

Latest papers

Most implemented Social Latest No code

Probing Multimodal Large Language Models for Global and Local Semantic Representations

kobayashikanna01/probing_MLLM_rep • 27 Feb 2024

The advancement of Multimodal Large Language Models (MLLMs) has greatly accelerated the development of applications in understanding integrated texts and images.

27 Feb 2024

Paper
Code

CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models

netflix/clove • • 22 Feb 2024

Recent years have witnessed a significant increase in the performance of Vision and Language tasks.

22 Feb 2024

Paper
Code

SHIELD : An Evaluation Benchmark for Face Spoofing and Forgery Detection with Multimodal Large Language Models

FaceOnLive/Face-Liveness-Detection-SDK-Linux • 6 Feb 2024

For the face forgery detection task, we evaluate GAN-based and diffusion-based data with both visual and acoustic modalities.

200

06 Feb 2024

Paper
Code

Lightweight Pixel Difference Networks for Efficient Visual Representation Learning

hellozhuo/pidinet • • 1 Feb 2024

With PDC and Bi-PDC, we further present two lightweight deep networks named \emph{Pixel Difference Networks (PiDiNet)} and \emph{Binary PiDiNet (Bi-PiDiNet)} respectively to learn highly efficient yet more accurate representations for visual tasks including edge detection and object recognition.

414

01 Feb 2024

Paper
Code

Self-supervised learning of video representations from a child's perspective

eminorhan/video-models • • 1 Feb 2024

These results suggest that important temporal aspects of a child's internal model of the world may be learnable from their visual experience using highly generic learning algorithms and without strong inductive biases.

01 Feb 2024

Paper
Code

Local Feature Matching Using Deep Learning: A Survey

vignywang/awesome-local-feature-matching • 31 Jan 2024

The objective of this endeavor is to furnish a comprehensive overview of local feature matching methods.

31 Jan 2024

Paper
Code

pix2gestalt: Amodal Segmentation by Synthesizing Wholes

cvlab-columbia/pix2gestalt • • 25 Jan 2024

We introduce pix2gestalt, a framework for zero-shot amodal segmentation, which learns to estimate the shape and appearance of whole objects that are only partially visible behind occlusions.

25 Jan 2024

Paper
Code

ContextMix: A context-aware data augmentation method for industrial visual inspection systems

hy2mk/contextmix • • 18 Jan 2024

With the minimal additional computation cost of image resizing, ContextMix enhances performance compared to existing augmentation techniques.

18 Jan 2024

Paper
Code

Seeing the roads through the trees: A benchmark for modeling spatial dependencies with aerial imagery

isaaccorley/chesapeakersc • • 12 Jan 2024

In this work we propose a road segmentation benchmark dataset, Chesapeake Roads Spatial Context (RSC), for evaluating the spatial long-range context understanding of geospatial machine learning models and show how commonly used semantic segmentation models can fail at this task.

12 Jan 2024

Paper
Code

CLIP-guided Federated Learning on Heterogeneous and Long-Tailed Data

shijiangming1/clip2fl • • 14 Dec 2023

For server-side learning, in order to mitigate the heterogeneity and class-distribution imbalance, we generate federated features to retrain the server model.

14 Dec 2023

Paper
Code

Object Recognition

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result