Contrastive Learning and Self-Training for Unsupervised Domain Adaptation in Semantic Segmentation

5 May 2021  ·  Robert A. Marsden, Alexander Bartler, Mario Döbler, Bin Yang ·

Deep convolutional neural networks have considerably improved state-of-the-art results for semantic segmentation. Nevertheless, even modern architectures lack the ability to generalize well to a test dataset that originates from a different domain. To avoid the costly annotation of training data for unseen domains, unsupervised domain adaptation (UDA) attempts to provide efficient knowledge transfer from a labeled source domain to an unlabeled target domain. Previous work has mainly focused on minimizing the discrepancy between the two domains by using adversarial training or self-training. While adversarial training may fail to align the correct semantic categories as it minimizes the discrepancy between the global distributions, self-training raises the question of how to provide reliable pseudo-labels. To align the correct semantic categories across domains, we propose a contrastive learning approach that adapts category-wise centroids across domains. Furthermore, we extend our method with self-training, where we use a memory-efficient temporal ensemble to generate consistent and reliable pseudo-labels. Although both contrastive learning and self-training (CLST) through temporal ensembling enable knowledge transfer between two domains, it is their combination that leads to a symbiotic structure. We validate our approach on two domain adaptation benchmarks: GTA5 $\rightarrow$ Cityscapes and SYNTHIA $\rightarrow$ Cityscapes. Our method achieves better or comparable results than the state-of-the-art. We will make the code publicly available.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Synthetic-to-Real Translation GTAV-to-Cityscapes Labels CLST mIoU 51.6 # 39
Synthetic-to-Real Translation SYNTHIA-to-Cityscapes CLST(ResNet-101) MIoU (13 classes) 57.8 # 19
MIoU (16 classes) 49.8 # 20

Methods