Pseudo Label-Guided Multi Task Learning for Scene Understanding
Multi-task learning (MTL) for scene understanding has been actively studied by exploiting correlation of multiple tasks. This work focuses on improving the performance of the MTL network that infers depth and semantic segmentation maps from a single image. Specifically, we propose a novel MTL architecture, called Pseudo-MTL, that introduces pseudo labels for joint learning of monocular depth estimation and semantic segmentation tasks. The pseudo ground truth depth maps, generated from pretrained stereo matching methods, are leveraged to supervise the monocular depth estimation. More importantly, the pseudo depth labels serve to impose a cross-view consistency on the estimated monocular depth and segmentation maps of two views. This enables for mitigating the mismatch problem incurred by inconsistent prediction results across two views. A thorough ablation study validates that the cross-view consistency leads to a substantial performance gain by ensuring inference-view invariance for the two tasks.
PDF Abstract