Object Localization

237 papers with code • 18 benchmarks • 17 datasets

Object Localization is the task of locating an instance of a particular object category in an image, typically by specifying a tightly cropped bounding box centered on the instance. An object proposal specifies a candidate bounding box, and an object proposal is said to be a correct localization if it sufficiently overlaps a human-labeled “ground-truth” bounding box for the given object. In the literature, the “Object Localization” task is to locate one instance of an object category, whereas “object detection” focuses on locating all instances of a category in a given image.

Source: Fast On-Line Kernel Density Estimation for Active Object Localization

Libraries

Use these libraries to find Object Localization models and implementations

Latest papers with no code

Three Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding

no code yet • 8 Sep 2023

A common formulation to tackle 3D visual grounding is grounding-by-detection, where localization is done via bounding boxes.

Semantic-Constraint Matching Transformer for Weakly Supervised Object Localization

no code yet • 4 Sep 2023

Weakly supervised object localization (WSOL) strives to learn to localize objects with only image-level supervision.

I3DOD: Towards Incremental 3D Object Detection via Prompting

no code yet • 24 Aug 2023

Meanwhile, the current class-incremental 3D object detection methods neglect the relationships between the object localization information and category semantic information and assume all the knowledge of old model is reliable.

Video OWL-ViT: Temporally-consistent open-world localization in video

no code yet • ICCV 2023

Our model is end-to-end trainable on video data and enjoys improved temporal consistency compared to tracking-by-detection baselines, while retaining the open-world capabilities of the backbone detector.

Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models

no code yet • 18 Aug 2023

In this work, we show qualitatively (using explainability tools) and quantitatively (using object detectors) that the poor object localization "grounding" ability of the models is a contributing factor to the poor image-text matching performance.

Leveraging Next-Active Objects for Context-Aware Anticipation in Egocentric Videos

no code yet • 16 Aug 2023

Compared to existing video modeling architectures for action anticipation, NAOGAT captures the relationship between objects and the global scene context in order to predict detections for the next active object and anticipate relevant future actions given these detections, leveraging the objects' dynamics to improve accuracy.

Rethinking the Localization in Weakly Supervised Object Localization

no code yet • 11 Aug 2023

Weakly supervised object localization (WSOL) is one of the most popular and challenging tasks in computer vision.

Rapid Training Data Creation by Synthesizing Medical Images for Classification and Localization

no code yet • 9 Aug 2023

We show the efficacy of this approach on both a weakly supervised localization model and a strongly supervised localization model.

A Memory-Augmented Multi-Task Collaborative Framework for Unsupervised Traffic Accident Detection in Driving Videos

no code yet • 27 Jul 2023

Different from previous approaches, our method can more accurately detect both ego-involved and non-ego accidents by simultaneously modeling appearance changes and object motions in video frames through the collaboration of optical flow reconstruction and future object localization tasks.

MPDIoU: A Loss for Efficient and Accurate Bounding Box Regression

no code yet • 14 Jul 2023

Bounding box regression (BBR) has been widely used in object detection and instance segmentation, which is an important step in object localization.