Object Localization
234 papers with code • 18 benchmarks • 17 datasets
Object Localization is the task of locating an instance of a particular object category in an image, typically by specifying a tightly cropped bounding box centered on the instance. An object proposal specifies a candidate bounding box, and an object proposal is said to be a correct localization if it sufficiently overlaps a human-labeled “ground-truth” bounding box for the given object. In the literature, the “Object Localization” task is to locate one instance of an object category, whereas “object detection” focuses on locating all instances of a category in a given image.
Source: Fast On-Line Kernel Density Estimation for Active Object Localization
Libraries
Use these libraries to find Object Localization models and implementationsSubtasks
Latest papers
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data
Large Vision-Language Models (VLMs) have demonstrated impressive performance on complex tasks involving visual input with natural language instructions.
CPR++: Object Localization via Single Coarse Point Supervision
CPR reduces the semantic variance by selecting a semantic centre point in a neighbourhood region to replace the initial annotated point.
Spatial Structure Constraints for Weakly Supervised Semantic Segmentation
In this paper, we propose spatial structure constraints (SSC) for weakly supervised semantic segmentation to alleviate the unwanted object over-activation of attention expansion.
Bilateral Reference for High-Resolution Dichotomous Image Segmentation
It comprises two essential components: the localization module (LM) and the reconstruction module (RM) with our proposed bilateral reference (BiRef).
LangSplat: 3D Language Gaussian Splatting
Humans live in a 3D world and commonly use natural language to interact with a 3D scene.
Dual Attention U-Net with Feature Infusion: Pushing the Boundaries of Multiclass Defect Segmentation
The proposed architecture, Dual Attentive U-Net with Feature Infusion (DAU-FI Net), addresses challenges in semantic segmentation, particularly on multiclass imbalanced datasets with limited samples.
Object-Aware Domain Generalization for Object Detection
To address these problems, we propose an object-aware domain generalization (OA-DG) method for single-domain generalization in object detection.
Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance
We introduce Open3DIS, a novel solution designed to tackle the problem of Open-Vocabulary Instance Segmentation within 3D scenes.
Mono3DVG: 3D Visual Grounding in Monocular Images
To foster this task, we propose Mono3DVG-TR, an end-to-end transformer-based network, which takes advantage of both the appearance and geometry information in text embeddings for multi-modal learning and 3D object localization.
Boosting Segment Anything Model Towards Open-Vocabulary Learning
The recent Segment Anything Model (SAM) has emerged as a new paradigmatic vision foundation model, showcasing potent zero-shot generalization and flexible prompting.