Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks

This work proposes a novel attentive graph neural network (AGNN) for zero-shot video object segmentation (ZVOS). The suggested AGNN recasts this task as a process of iterative information fusion over video graphs. Specifically, AGNN builds a fully connected graph to efficiently represent frames as nodes, and relations between arbitrary frame pairs as edges. The underlying pair-wise relations are described by a differentiable attention mechanism. Through parametric message passing, AGNN is able to efficiently capture and mine much richer and higher-order relations between video frames, thus enabling a more complete understanding of video content and more accurate foreground estimation. Experimental results on three video segmentation datasets show that AGNN sets a new state-of-the-art in each case. To further demonstrate the generalizability of our framework, we extend AGNN to an additional task: image object co-segmentation (IOCS). We perform experiments on two famous IOCS datasets and observe again the superiority of our AGNN model. The extensive experiments verify that AGNN is able to learn the underlying semantic/appearance relationships among video frames or related images, and discover the common objects.

PDF Abstract ICCV 2019 PDF ICCV 2019 Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Unsupervised Video Object Segmentation DAVIS 2016 val AGNN G 79.9 # 20
J 80.7 # 19
F 79.1 # 19
Unsupervised Video Object Segmentation YouTube-Objects AGNN J 70.8 # 4

Methods