Object Localization
234 papers with code • 18 benchmarks • 17 datasets
Object Localization is the task of locating an instance of a particular object category in an image, typically by specifying a tightly cropped bounding box centered on the instance. An object proposal specifies a candidate bounding box, and an object proposal is said to be a correct localization if it sufficiently overlaps a human-labeled “ground-truth” bounding box for the given object. In the literature, the “Object Localization” task is to locate one instance of an object category, whereas “object detection” focuses on locating all instances of a category in a given image.
Source: Fast On-Line Kernel Density Estimation for Active Object Localization
Libraries
Use these libraries to find Object Localization models and implementationsSubtasks
Latest papers
Grounding Everything: Emerging Localization Properties in Vision-Language Transformers
To leverage those capabilities, we propose a Grounding Everything Module (GEM) that generalizes the idea of value-value attention introduced by CLIPSurgery to a self-self attention path.
Point, Segment and Count: A Generalized Framework for Object Counting
In this paper, we propose a generalized framework for both few-shot and zero-shot object counting based on detection.
Towards Learning Monocular 3D Object Localization From 2D Labels using the Physical Laws of Motion
We present a novel method for precise 3D object localization in single images from a single calibrated camera using only 2D labels.
Unsupervised Object Localization in the Era of Self-Supervised ViTs: A Survey
We propose here a survey of unsupervised object localization methods that discover objects in images without requiring any manual annotation in the era of self-supervised ViTs.
DiPS: Discriminative Pseudo-Label Sampling with Self-Supervised Transformers for Weakly Supervised Object Localization
Subsequently, these proposals are used as pseudo-labels to train our new transformer-based WSOL model designed to perform classification and localization tasks.
CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection
Open-vocabulary 3D Object Detection (OV-3DDet) aims to detect objects from an arbitrary list of categories within a 3D scene, which remains seldom explored in the literature.
Learning to Terminate in Object Navigation
This paper tackles the critical challenge of object navigation in autonomous navigation systems, particularly focusing on the problem of target approach and episode termination in environments with long optimal episode length in Deep Reinforcement Learning (DRL) based methods.
Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs
We present an Open-Vocabulary 3D Scene Graph (OVSG), a formal framework for grounding a variety of entities, such as object instances, agents, and regions, with free-form text-based queries.
CLIP-DIY: CLIP Dense Inference Yields Open-Vocabulary Semantic Segmentation For-Free
The emergence of CLIP has opened the way for open-world image perception.
Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation
In addition, our method also achieves state-of-the-art weakly supervised semantic segmentation performance on the PASCAL VOC 2012 and MS COCO 2014 datasets.