Visual Relationship Detection
36 papers with code • 5 benchmarks • 5 datasets
Visual relationship detection (VRD) is one newly developed computer vision task aiming to recognize relations or interactions between objects in an image. It is a further learning task after object recognition and is essential for fully understanding images, even the visual world.
Latest papers with no code
A Comprehensive Survey of Scene Graphs: Generation and Application
For example, given an image, we want to not only detect and recognize objects in the image, but also know the relationship between objects (visual relationship detection), and generate a text description (image captioning) based on the image content.
Towards Overcoming False Positives in Visual Relationship Detection
We observe that during training, the relationship proposal distribution is highly imbalanced: most of the negative relationship proposals are easy to identify, e. g., the inaccurate object detection, which leads to the under-fitting of low-frequency difficult proposals.
Constructing a Visual Relationship Authenticity Dataset
Visual relationship detection is crucial for scene understanding in images.
Hierarchical Graph Attention Network for Visual Relationship Detection
Object-level graph aims to capture the interactions between objects, while the triplet-level graph models the dependencies among relation triplets.
Fixed-size Objects Encoding for Visual Relationship Detection
Instead, we propose a novel method to encode all background objects in each image by using one fixed-size vector (i. e., FBE vector).
Visual Relationship Detection using Scene Graphs: A Survey
In this paper, we present a detailed survey on the various techniques for scene graph generation, their efficacy to represent visual relationships and how it has been used to solve various downstream tasks.
CPARR: Category-based Proposal Analysis for Referring Relationships
The task of referring relationships is to localize subject and object entities in an image satisfying a relationship query, which is given in the form of \texttt{<subject, predicate, object>}.
Deep Adaptive Semantic Logic (DASL): Compiling Declarative Knowledge into Deep Neural Networks
We introduce Deep Adaptive Semantic Logic (DASL), a novel framework for automating the generation of deep neural networks that incorporates user-provided formal knowledge to improve learning from data.
ReLaText: Exploiting Visual Relationships for Arbitrary-Shaped Scene Text Detection with Graph Convolutional Networks
The key idea is to decompose text detection into two subproblems, namely detection of text primitives and prediction of link relationships between nearby text primitive pairs.
Visual Relationship Detection with Low Rank Non-Negative Tensor Decomposition
We address the problem of Visual Relationship Detection (VRD) which aims to describe the relationships between pairs of objects in the form of triplets of (subject, predicate, object).