Visual Relationship Detection

36 papers with code • 5 benchmarks • 5 datasets

Visual relationship detection (VRD) is one newly developed computer vision task aiming to recognize relations or interactions between objects in an image. It is a further learning task after object recognition and is essential for fully understanding images, even the visual world.

Most implemented papers

Visualization of Contributions to Open-Source Projects

onyame/Git2PROV 17 Oct 2020

We want to analyze visually, to what extend team members and external developers contribute to open-source projects.

LIGHTEN: Learning Interactions with Graph and Hierarchical TEmporal Networks for HOI in videos

praneeth11009/LIGHTEN-Learning-Interactions-with-Graphs-and-Hierarchical-TEmporal-Networks-for-HOI 17 Dec 2020

Analyzing the interactions between humans and objects from a video includes identification of the relationships between humans and the objects present in the video.

Grounding Consistency: Distilling Spatial Common Sense for Precise Visual Relationship Detection

deeplab-ai/grounding-consistent-vrd ICCV 2021

Scene Graph Generators (SGGs) are models that, given an image, build a directed graph where each edge represents a predicted subject predicate object triplet.

RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition

Vision-CAIR/RelTransformer CVPR 2022

This paper shows that modeling an effective message-passing flow through an attention mechanism can be critical to tackling the compositionality and long-tail challenges in VRR.

2.5D Visual Relationship Detection

google-research-datasets/2.5vrd 26 Apr 2021

To enable progress on this task, we create a new dataset consisting of 220k human-annotated 2. 5D relationships among 512K objects from 11K images.

Recovering the Unbiased Scene Graphs from the Biased Ones

coldmanck/recovering-unbiased-scene-graphs 5 Jul 2021

Given input images, scene graph generation (SGG) aims to produce comprehensive, graphical representations describing visual relationships among salient objects.

Spatial-Temporal Transformer for Dynamic Scene Graph Generation

yrcong/sttran ICCV 2021

Compared to the task of scene graph generation from images, it is more challenging because of the dynamic relationships between objects and the temporal dependencies between frames allowing for a richer semantic interpretation.

Image Scene Graph Generation (SGG) Benchmark

microsoft/scene_graph_benchmark 27 Jul 2021

There is a surge of interest in image scene graph generation (object, attribute and relationship detection) due to the need of building fine-grained image understanding models that go beyond object detection.

PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models

thunlp/pevl 23 May 2022

We show that PEVL enables state-of-the-art performance of detector-free VLP models on position-sensitive tasks such as referring expression comprehension and phrase grounding, and also improves the performance on position-insensitive tasks with grounded inputs.

Neural Message Passing for Visual Relationship Detection

phyllish/nmp 8 Aug 2022

Visual relationship detection aims to detect the interactions between objects in an image; however, this task suffers from combinatorial explosion due to the variety of objects and interactions.