Object Recognition

486 papers with code • 7 benchmarks • 42 datasets

Object recognition is a computer vision technique for detecting + classifying objects in images or videos. Since this is a combined task of object detection plus image classification, the state-of-the-art tables are recorded for each component task here and here.

( Image credit: Tensorflow Object Detection API )

Libraries

Use these libraries to find Object Recognition models and implementations

Latest papers with no code

Learn and Search: An Elegant Technique for Object Lookup using Contrastive Learning

no code yet • 12 Mar 2024

The rapid proliferation of digital content and the ever-growing need for precise object recognition and segmentation have driven the advancement of cutting-edge techniques in the field of object classification and segmentation.

Mapping High-level Semantic Regions in Indoor Environments without Object Recognition

no code yet • 11 Mar 2024

Robots require a semantic understanding of their surroundings to operate in an efficient and explainable way in human environments.

Textureless Object Recognition: An Edge-based Approach

no code yet • 10 Mar 2024

It has been challenging to obtain good accuracy in real time because of its lack of discriminative features and reflectance properties which makes the techniques for textured object recognition insufficient for textureless objects.

A spatiotemporal style transfer algorithm for dynamic visual stimulus generation

no code yet • 7 Mar 2024

It is based on a two-stream deep neural network model that factorizes spatial and temporal features to generate dynamic visual stimuli whose model layer activations are matched to those of input videos.

LoDisc: Learning Global-Local Discriminative Features for Self-Supervised Fine-Grained Visual Recognition

no code yet • 6 Mar 2024

In this paper, we present to incorporate the subtle local fine-grained feature learning into global self-supervised contrastive learning through a pure self-supervised global-local fine-grained contrastive learning framework.

MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual Grounding

no code yet • 5 Mar 2024

3D visual grounding involves matching natural language descriptions with their corresponding objects in 3D spaces.

Dual Pose-invariant Embeddings: Learning Category and Object-specific Discriminative Representations for Recognition and Retrieval

no code yet • 1 Mar 2024

This paper presents an attention-based dual-encoder architecture with specially designed loss functions that optimize the inter- and intra-class distances simultaneously in two different embedding spaces, one for the category embeddings and the other for the object-level embeddings.

Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Model

no code yet • 29 Feb 2024

Large Vision-Language Models (LVLMs) rely on vision encoders and Large Language Models (LLMs) to exhibit remarkable capabilities on various multi-modal tasks in the joint space of vision and language.

DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments

no code yet • 29 Feb 2024

Zero-Shot Object Navigation (ZSON) requires agents to autonomously locate and approach unseen objects in unfamiliar environments and has emerged as a particularly challenging task within the domain of Embodied AI.

ISCUTE: Instance Segmentation of Cables Using Text Embedding

no code yet • 19 Feb 2024

In the field of robotics and automation, conventional object recognition and instance segmentation methods face a formidable challenge when it comes to perceiving Deformable Linear Objects (DLOs) like wires, cables, and flexible tubes.