Object Localization
231 papers with code • 18 benchmarks • 17 datasets
Object Localization is the task of locating an instance of a particular object category in an image, typically by specifying a tightly cropped bounding box centered on the instance. An object proposal specifies a candidate bounding box, and an object proposal is said to be a correct localization if it sufficiently overlaps a human-labeled “ground-truth” bounding box for the given object. In the literature, the “Object Localization” task is to locate one instance of an object category, whereas “object detection” focuses on locating all instances of a category in a given image.
Source: Fast On-Line Kernel Density Estimation for Active Object Localization
Libraries
Use these libraries to find Object Localization models and implementationsSubtasks
Latest papers
Bilateral Reference for High-Resolution Dichotomous Image Segmentation
It comprises two essential components: the localization module (LM) and the reconstruction module (RM) with our proposed bilateral reference (BiRef).
LangSplat: 3D Language Gaussian Splatting
Humans live in a 3D world and commonly use natural language to interact with a 3D scene.
Dual Attention U-Net with Feature Infusion: Pushing the Boundaries of Multiclass Defect Segmentation
The proposed architecture, Dual Attentive U-Net with Feature Infusion (DAU-FI Net), addresses challenges in semantic segmentation, particularly on multiclass imbalanced datasets with limited samples.
Object-Aware Domain Generalization for Object Detection
To address these problems, we propose an object-aware domain generalization (OA-DG) method for single-domain generalization in object detection.
Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance
We introduce Open3DIS, a novel solution designed to tackle the problem of Open-Vocabulary Instance Segmentation within 3D scenes.
Mono3DVG: 3D Visual Grounding in Monocular Images
To foster this task, we propose Mono3DVG-TR, an end-to-end transformer-based network, which takes advantage of both the appearance and geometry information in text embeddings for multi-modal learning and 3D object localization.
Boosting Segment Anything Model Towards Open-Vocabulary Learning
The recent Segment Anything Model (SAM) has emerged as a new paradigmatic vision foundation model, showcasing potent zero-shot generalization and flexible prompting.
Grounding Everything: Emerging Localization Properties in Vision-Language Transformers
To leverage those capabilities, we propose a Grounding Everything Module (GEM) that generalizes the idea of value-value attention introduced by CLIPSurgery to a self-self attention path.
Point, Segment and Count: A Generalized Framework for Object Counting
In this paper, we propose a generalized framework for both few-shot and zero-shot object counting based on detection.
Towards Learning Monocular 3D Object Localization From 2D Labels using the Physical Laws of Motion
We present a novel method for precise 3D object localization in single images from a single calibrated camera using only 2D labels.