Adaptive Context Network for Scene Parsing

Recent works attempt to improve scene parsing performance by exploring different levels of contexts, and typically train a well-designed convolutional network to exploit useful contexts across all pixels equally. However, in this paper, we find that the context demands are varying from different pixels or regions in each image. Based on this observation, we propose an Adaptive Context Network (ACNet) to capture the pixel-aware contexts by a competitive fusion of global context and local context according to different per-pixel demands. Specifically, when given a pixel, the global context demand is measured by the similarity between the global feature and its local feature, whose reverse value can be used to measure the local context demand. We model the two demand measurements by the proposed global context module and local context module, respectively, to generate adaptive contextual features. Furthermore, we import multiple such modules to build several adaptive context blocks in different levels of network to obtain a coarse-to-fine result. Finally, comprehensive experimental evaluations demonstrate the effectiveness of the proposed ACNet, and new state-of-the-arts performances are achieved on all four public datasets, i.e. Cityscapes, ADE20K, PASCAL Context, and COCO Stuff.

PDF Abstract ICCV 2019 PDF ICCV 2019 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Semantic Segmentation ADE20K ACNet (ResNet-101) Validation mIoU 45.90 # 176
Semantic Segmentation ADE20K ACNet (ResNet-101) Validation mIoU 45.90 # 176
Semantic Segmentation ADE20K val ACNet(ResNet-101) mIoU 45.90 # 72
Semantic Segmentation ADE20K val ACNet (ResNet-101) mIoU 45.90 # 72

Methods


No methods listed for this paper. Add relevant methods here