Video Visual Relation Detection

7 papers with code • 2 benchmarks • 2 datasets

Video Visual Relation Detection (VidVRD) aims to detect instances of visual relations of interest in a video, where a visual relation instance is represented by a relation triplet <subject, predicate, object> with the trajectories of the subject and object. As compared to still images, videos provide a more natural set of features for detecting visual relations, such as the dynamic relations like “A-follow-B” and “A-towards-B”, and temporally changing relations like “A-chase-B” followed by “A-hold-B”. Yet, VidVRD is technically more challenging than ImgVRD due to the difficulties in accurate object tracking and diverse relation appearances in the video domain.

Source: ImageNet-VidVRD Video Visual Relation Dataset

Benchmarks

Add a Result

These leaderboards are used to track progress in Video Visual Relation Detection

Trend	Dataset	Best Model	Paper	Code	Compare
	ImageNet-VidVRD	Social Fabric			See all
	VidOR	Social Fabric			See all

Datasets

Most implemented papers

Most implemented Social Latest No code

Video Relationship Reasoning using Gated Spatio-Temporal Energy Graph

yaohungt/GSTEG_CVPR_2019 • • CVPR 2019

Visual relationship reasoning is a crucial yet challenging task for understanding rich interactions across visual concepts.

Paper
Code

LIGHTEN: Learning Interactions with Graph and Hierarchical TEmporal Networks for HOI in videos

praneeth11009/LIGHTEN-Learning-Interactions-with-Graphs-and-Hierarchical-TEmporal-Networks-for-HOI • • 17 Dec 2020

Analyzing the interactions between humans and objects from a video includes identification of the relationships between humans and the objects present in the video.

Paper
Code

What and When to Look?: Temporal Span Proposal Network for Video Relation Detection

sangminwoo/Temporal-Span-Proposal-Network-VidVRD • • 15 Jul 2021

TSPN tells when to look: it simultaneously predicts start-end timestamps (i. e., temporal spans) and categories of the all possible relations by utilizing full video context.

Paper
Code

Spatial-Temporal Transformer for Dynamic Scene Graph Generation

yrcong/sttran • • ICCV 2021

Compared to the task of scene graph generation from images, it is more challenging because of the dynamic relationships between objects and the temporal dependencies between frames allowing for a richer semantic interpretation.

Paper
Code

Social Fabric: Tubelet Compositions for Video Relation Detection

shanshuo/social-fabric • ICCV 2021

We also propose Social Fabric: an encoding that represents a pair of object tubelets as a composition of interaction primitives.

Paper
Code

Video Relation Detection via Tracklet based Visual Transformer

dawn-lx/vidvrd-tracklets • • 19 Aug 2021

Video Visual Relation Detection (VidVRD), has received significant attention of our community over recent years.

Paper
Code

Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection

dawn-lx/openvoc-vidvrd • • 1 Feb 2023

Without bells and whistles, our RePro achieves a new state-of-the-art performance on two VidVRD benchmarks of not only the base training object and predicate categories, but also the unseen ones.

Paper
Code

Video Visual Relation Detection

Benchmarks Add a Result

Datasets

Most implemented papers

Video Relationship Reasoning using Gated Spatio-Temporal Energy Graph

LIGHTEN: Learning Interactions with Graph and Hierarchical TEmporal Networks for HOI in videos

What and When to Look?: Temporal Span Proposal Network for Video Relation Detection

Spatial-Temporal Transformer for Dynamic Scene Graph Generation

Social Fabric: Tubelet Compositions for Video Relation Detection

Video Relation Detection via Tracklet based Visual Transformer

Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection

Content

Benchmarks

Add a Result