Visual Coreference Resolution in Visual Dialog using Neural Module Networks

Visual dialog entails answering a series of questions grounded in an image, using dialog history as context. In addition to the challenges found in visual question answering (VQA), which can be seen as one-round dialog, visual dialog encompasses several more... (read more)

PDF Abstract ECCV 2018 PDF ECCV 2018 Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Visual Dialog VisDial v0.9 val CorefNMN MRR 63.6 # 5
Mean Rank 4.53 # 10
R@1 50.24 # 9
R@10 88.51 # 9
R@5 79.81 # 10
Visual Dialog VisDial v0.9 val CorefNMN (ResNet-152) MRR 64.1 # 3
Mean Rank 4.45 # 8
R@1 50.92 # 7
R@10 88.81 # 8
R@5 80.18 # 9
Common Sense Reasoning Visual Dialog v0.9 NMN [kottur2018visual] 1 in 10 R@5 80.1 # 1
Visual Dialog Visual Dialog v1.0 test-std CorefNMN (ResNet-152) NDCG (x 100) 54.70 # 37
MRR (x 100) 61.50 # 19
R@1 47.55 # 19
R@5 78.10 # 17
R@10 88.80 # 16
Mean 4.40 # 23

Methods used in the Paper


METHOD TYPE
Memory Network
Working Memory Models