SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance

27 Nov 2023  ยท  Lukas Hoyer, David Joseph Tan, Muhammad Ferjad Naeem, Luc van Gool, Federico Tombari ยท

In semi-supervised semantic segmentation, a model is trained with a limited number of labeled images along with a large corpus of unlabeled images to reduce the high annotation effort. While previous methods are able to learn good segmentation boundaries, they are prone to confuse classes with similar visual appearance due to the limited supervision. On the other hand, vision-language models (VLMs) are able to learn diverse semantic knowledge from image-caption datasets but produce noisy segmentation due to the image-level training. In SemiVL, we propose to integrate rich priors from VLM pre-training into semi-supervised semantic segmentation to learn better semantic decision boundaries. To adapt the VLM from global to local reasoning, we introduce a spatial fine-tuning strategy for label-efficient learning. Further, we design a language-guided decoder to jointly reason over vision and language. Finally, we propose to handle inherent ambiguities in class labels by providing the model with language guidance in the form of class definitions. We evaluate SemiVL on 4 semantic segmentation datasets, where it significantly outperforms previous semi-supervised methods. For instance, SemiVL improves the state-of-the-art by +13.5 mIoU on COCO with 232 annotated images and by +6.1 mIoU on Pascal VOC with 92 labels. Project page: https://github.com/google-research/semivl

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Semi-Supervised Semantic Segmentation ADE20K 1/16 labeled SemiVL Validation mIoU 37.2 # 1
Semi-Supervised Semantic Segmentation ADE20K 1/32 labeled SemiVL Validation mIoU 35.1 # 1
Semi-Supervised Semantic Segmentation Cityscapes 100 samples labeled SemiVL (ViT-B/16) Validation mIoU 76.2 # 1
Semi-Supervised Semantic Segmentation Cityscapes 12.5% labeled SemiVL (ViT-B/16) Validation mIoU 79.4% # 1
Semi-Supervised Semantic Segmentation Cityscapes 25% labeled SemiVL (ViT-B/16) Validation mIoU 80.3% # 1
Semi-Supervised Semantic Segmentation Cityscapes 50% labeled SemiVL (ViT-B/16) Validation mIoU 80.6% # 1
Semi-Supervised Semantic Segmentation Cityscapes 6.25% labeled SemiVL (ViT-B/16) Validation mIoU 77.9 # 1
Semi-Supervised Semantic Segmentation COCO 1/128 labeled SemiVL Validation mIoU 53.6 # 1
Semi-Supervised Semantic Segmentation COCO 1/256 labeled SemiVL Validation mIoU 52.8 # 1
Semi-Supervised Semantic Segmentation COCO 1/32 labeled SemiVL Validation mIoU 56.5 # 1
Semi-Supervised Semantic Segmentation COCO 1/512 labeled SemiVL Validation mIoU 50.1 # 1
Semi-Supervised Semantic Segmentation COCO 1/64 labeled SemiVL Validation mIoU 55.4 # 1
Semi-Supervised Semantic Segmentation PASCAL VOC 2012 1464 labels SemiVL (ViT-B/16 Validation mIoU 87.3 # 1
Semi-Supervised Semantic Segmentation PASCAL VOC 2012 1464 labels UniMatch (ViT-B/16) Validation mIoU 84.0 # 2
Semi-Supervised Semantic Segmentation PASCAL VOC 2012 183 labeled SemiVL (ViT-B/16) Validation mIoU 85.6 # 1
Semi-Supervised Semantic Segmentation PASCAL VOC 2012 183 labeled UniMatch (ViT-B/16) Validation mIoU 80.1 # 2
Semi-Supervised Semantic Segmentation PASCAL VOC 2012 366 labeled UniMatch (ViT-B/16) Validation mIoU 82.0 # 2
Semi-Supervised Semantic Segmentation PASCAL VOC 2012 366 labeled SemiVL (ViT-B/16) Validation mIoU 86.0 # 1
Semi-Supervised Semantic Segmentation PASCAL VOC 2012 732 labeled SemiVL (ViT-B/16) Validation mIoU 86.7 # 1
Semi-Supervised Semantic Segmentation PASCAL VOC 2012 732 labeled UniMatch (ViT-B/16) Validation mIoU 83.3 # 2
Semi-Supervised Semantic Segmentation PASCAL VOC 2012 92 labeled SemiVL (ViT-B/16) Validation mIoU 84.0 # 1
Semi-Supervised Semantic Segmentation PASCAL VOC 2012 92 labeled UniMatch (ViT-B/16) Validation mIoU 77.9 # 2

Methods


No methods listed for this paper. Add relevant methods here