Visual Relationship Detection
36 papers with code • 5 benchmarks • 5 datasets
Visual relationship detection (VRD) is one newly developed computer vision task aiming to recognize relations or interactions between objects in an image. It is a further learning task after object recognition and is essential for fully understanding images, even the visual world.
Most implemented papers
The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale
We present Open Images V4, a dataset of 9. 2M images with unified annotations for image classification, object detection and visual relationship detection.
BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection
We demonstrate the practical interest of our fusion model by using BLOCK for two challenging tasks: Visual Question Answering (VQA) and Visual Relationship Detection (VRD), where we design end-to-end learnable architectures for representing relevant interactions between modalities.
Deeply Supervised Multimodal Attentional Translation Embeddings for Visual Relationship Detection
Detecting visual relationships, i. e. <Subject, Predicate, Object> triplets, is a challenging Scene Understanding task approached in the past via linguistic priors or spatial information in a single feature branch.
Visual Relationship Detection with Language prior and Softmax
Visual relationship detection is an intermediate image understanding task that detects two objects and classifies a predicate that explains the relationship between two objects in an image.
Improving Visual Relation Detection using Depth Maps
We argue that depth maps can additionally provide valuable information on object relations, e. g. helping to detect not only spatial relations, such as standing behind, but also non-spatial relations, such as holding.
Visual Relationship Detection with Relative Location Mining
Visual relationship detection, as a challenging task used to find and distinguish the interactions between object pairs in one image, has received much attention recently.
NODIS: Neural Ordinary Differential Scene Understanding
Detected objects, their labels and the discovered relations can be used to construct a scene graph which provides an abstract semantic interpretation of an image.
AVR: Attention based Salient Visual Relationship Detection
To address this problem, we propose an attention based model, namely AVR, to achieve salient visual relationships based on both local and global context of the relationships.
Explanation-based Weakly-supervised Learning of Visual Relations with Graph Networks
Visual relationship detection is fundamental for holistic image understanding.
Visual Relationship Detection with Visual-Linguistic Knowledge from Multimodal Representations
Visual relationship detection aims to reason over relationships among salient objects in images, which has drawn increasing attention over the past few years.