|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
We demonstrate the practical interest of our fusion model by using BLOCK for two challenging tasks: Visual Question Answering (VQA) and Visual Relationship Detection (VRD), where we design end-to-end learnable architectures for representing relevant interactions between modalities.
Generating scene graph to describe all the relations inside an image gains increasing interests these years.
Ranked #1 on Scene Graph Generation on VRD
In this paper, we propose a novel framework, called Deep Structural Ranking, for visual relationship detection.
This requires the detection of visual relationships: triples (subject, relation, object) describing a semantic relation between a subject and an object.
To capture such global interdependency, we propose a deep Variation-structured Reinforcement Learning (VRL) framework to sequentially discover object relationships and attributes in the whole image.
We present Open Images V4, a dataset of 9. 2M images with unified annotations for image classification, object detection and visual relationship detection.
This paper presents a framework for localization or grounding of phrases in images using a large collection of linguistic and visual cues.
We propose to study the task of Long-Tail Visual Relationship Recognition (LTVRR), which aims at generalizing on the structured long-tail distribution of visual relationships (e. g., "rabbit grazing on grass").
Visual relationship detection, as a challenging task used to find and distinguish the interactions between object pairs in one image, has received much attention recently.
Analyzing the interactions between humans and objects from a video includes identification of the relationships between humans and the objects present in the video.