Visual Relationship Detection

36 papers with code • 5 benchmarks • 5 datasets

Visual relationship detection (VRD) is one newly developed computer vision task aiming to recognize relations or interactions between objects in an image. It is a further learning task after object recognition and is essential for fully understanding images, even the visual world.

Most implemented papers

Graphical Contrastive Losses for Scene Graph Parsing

dmlc/dgl CVPR 2019

The first, Entity Instance Confusion, occurs when the model confuses multiple instances of the same type of entity (e. g. multiple cups).

Exploring Long Tail Visual Relationship Recognition with Large Vocabulary

Vision-CAIR/LTVRR ICCV 2021

We use these benchmarks to study the performance of several state-of-the-art long-tail models on the LTVRR setup.

Compensating Supervision Incompleteness with Prior Knowledge in Semantic Image Interpretation

ivanDonadello/Visual-Relationship-Detection-LTN 1 Oct 2019

This requires the detection of visual relationships: triples (subject, relation, object) describing a semantic relation between a subject and an object.

One Metric to Measure them All: Localisation Recall Precision (LRP) for Evaluating Visual Detection Tasks

kemaloksuz/LRP-Error 21 Nov 2020

Despite being widely used as a performance measure for visual detection tasks, Average Precision (AP) is limited in (i) reflecting localisation quality, (ii) interpretability and (iii) robustness to the design choices regarding its computation, and its applicability to outputs without confidence scores.

Representing Prior Knowledge Using Randomly, Weighted Feature Networks for Visual Relationship Detection

pavliclab/aaai2022-clear2022-visual_relationship_detection-rwfn AAAI Workshop CLeaR 2022

Furthermore, background knowledge represented by RWFNs can be used to alleviate the incompleteness of training sets even though the space complexity of RWFNs is much smaller than LTNs (1:27 ratio).

Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues

BryanPlummer/pl-clc ICCV 2017

This paper presents a framework for localization or grounding of phrases in images using a large collection of linguistic and visual cues.

Deep Variation-structured Reinforcement Learning for Visual Relationship and Attribute Detection

nexusapoorvacus/DeepVariationStructuredRL CVPR 2017

To capture such global interdependency, we propose a deep Variation-structured Reinforcement Learning (VRL) framework to sequentially discover object relationships and attributes in the whole image.

Towards Context-Aware Interaction Recognition for Visual Relationship Detection

jingruixiaozhuang/iccv2017_vrd ICCV 2017

The proposed method still builds one classifier for one interaction (as per type (ii) above), but the classifier built is adaptive to context via weights which are context dependent.

Visual relationship detection with deep structural ranking

GriffinLiang/vrd-dsr 27 Apr 2018

In this paper, we propose a novel framework, called Deep Structural Ranking, for visual relationship detection.

Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation

yikang-li/FactorizableNet ECCV 2018

Generating scene graph to describe all the relations inside an image gains increasing interests these years.