Scene Graph Generation

113 papers with code • 5 benchmarks • 7 datasets

A scene graph is a structured representation of an image, where nodes in a scene graph correspond to object bounding boxes with their object categories, and edges correspond to their pairwise relationships between objects. The task of Scene Graph Generation is to generate a visually-grounded scene graph that most accurately correlates with an image.

Source: Scene Graph Generation by Iterative Message Passing

Benchmarks

Add a Result

These leaderboards are used to track progress in Scene Graph Generation

Dataset	Best Model	Compare
Visual Genome	SpeaQ (without reweighting)	See all
4D-OR	ORacle	See all
VRD	FactorizableNet	See all
3R-Scan	SceneGraphFusion	See all
MS-COCO	NeuSyRE	See all

Libraries

Use these libraries to find Scene Graph Generation models and implementations

rafa-cxg/PySGG-cxg

3 papers

suprosanna/relationformer

2 papers

shikorab/SceneGraph

2 papers

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

Sketching Image Gist: Human-Mimetic Hierarchical Scene Graph Generation

Kenneth-Wong/het-eccv20 • • ECCV 2020

Scene graph aims to faithfully reveal humans' perception of image content.

Paper
Code

PCPL: Predicate-Correlation Perception Learning for Unbiased Scene Graph Generation

coldmanck/recovering-unbiased-scene-graphs • • 2 Sep 2020

Today, scene graph generation(SGG) task is largely limited in realistic scenarios, mainly due to the extremely long-tailed bias of predicate annotation distribution.

Paper
Code

CogTree: Cognition Tree Loss for Unbiased Scene Graph Generation

CYVincent/Scene-Graph-Transformer-CogTree • • 16 Sep 2020

We first build a cognitive structure CogTree to organize the relationships based on the prediction of a biased SGG model.

Paper
Code

Are scene graphs good enough to improve Image Captioning?

iacercalixto/butd-image-captioning • • Asian Chapter of the Association for Computational Linguistics 2020

Overall, we find no significant difference between models that use scene graph features and models that only use object detection features across different captioning metrics, which suggests that existing scene graph generation models are still too noisy to be useful in image captioning.

Paper
Code

Dense Relational Image Captioning via Multi-task Triple-Stream Networks

Dong-JinKim/DenseRelationalCaptioning • • 8 Oct 2020

To this end, we propose the multi-task triple-stream network (MTTSNet) which consists of three recurrent units responsible for each POS which is trained by jointly predicting the correct captions and POS for each word.

Paper
Code