Visual Relationship Detection
36 papers with code • 5 benchmarks • 5 datasets
Visual relationship detection (VRD) is one newly developed computer vision task aiming to recognize relations or interactions between objects in an image. It is a further learning task after object recognition and is essential for fully understanding images, even the visual world.
Latest papers
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
Compared to the task of scene graph generation from images, it is more challenging because of the dynamic relationships between objects and the temporal dependencies between frames allowing for a richer semantic interpretation.
Recovering the Unbiased Scene Graphs from the Biased Ones
Given input images, scene graph generation (SGG) aims to produce comprehensive, graphical representations describing visual relationships among salient objects.
2.5D Visual Relationship Detection
To enable progress on this task, we create a new dataset consisting of 220k human-annotated 2. 5D relationships among 512K objects from 11K images.
RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition
This paper shows that modeling an effective message-passing flow through an attention mechanism can be critical to tackling the compositionality and long-tail challenges in VRR.
Grounding Consistency: Distilling Spatial Common Sense for Precise Visual Relationship Detection
Scene Graph Generators (SGGs) are models that, given an image, build a directed graph where each edge represents a predicted subject predicate object triplet.
LIGHTEN: Learning Interactions with Graph and Hierarchical TEmporal Networks for HOI in videos
Analyzing the interactions between humans and objects from a video includes identification of the relationships between humans and the objects present in the video.
One Metric to Measure them All: Localisation Recall Precision (LRP) for Evaluating Visual Detection Tasks
Despite being widely used as a performance measure for visual detection tasks, Average Precision (AP) is limited in (i) reflecting localisation quality, (ii) interpretability and (iii) robustness to the design choices regarding its computation, and its applicability to outputs without confidence scores.
Visualization of Contributions to Open-Source Projects
We want to analyze visually, to what extend team members and external developers contribute to open-source projects.
Visual Relationship Detection with Visual-Linguistic Knowledge from Multimodal Representations
Visual relationship detection aims to reason over relationships among salient objects in images, which has drawn increasing attention over the past few years.
Explanation-based Weakly-supervised Learning of Visual Relations with Graph Networks
Visual relationship detection is fundamental for holistic image understanding.