Single Frame Semantic Segmentation Using Multi-Modal Spherical Images

18 Aug 2023  ยท  Suresh Guttikonda, Jason Rambach ยท

In recent years, the research community has shown a lot of interest to panoramic images that offer a 360-degree directional perspective. Multiple data modalities can be fed, and complimentary characteristics can be utilized for more robust and rich scene interpretation based on semantic segmentation, to fully realize the potential. Existing research, however, mostly concentrated on pinhole RGB-X semantic segmentation. In this study, we propose a transformer-based cross-modal fusion architecture to bridge the gap between multi-modal fusion and omnidirectional scene perception. We employ distortion-aware modules to address extreme object deformations and panorama distortions that result from equirectangular representation. Additionally, we conduct cross-modal interactions for feature rectification and information exchange before merging the features in order to communicate long-range contexts for bi-modal and tri-modal feature streams. In thorough tests using combinations of four different modality types in three indoor panoramic-view datasets, our technique achieved state-of-the-art mIoU performance: 60.60% on Stanford2D3DS (RGB-HHA), 71.97% Structured3D (RGB-D-N), and 35.92% Matterport3D (RGB-D). We plan to release all codes and trained models soon.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Semantic Segmentation Matterport3D SFSS-MMSI (RGB+Depth) Validation mIoU 39.19 # 2
Test mIoU 35.92 # 1
Semantic Segmentation Matterport3D SFSS-MMSI (RGB+Depth+Normal) Validation mIoU 39.26 # 1
Test mIoU 35.52 # 3
Semantic Segmentation Matterport3D SFSS-MMSI (RGB+Normal) Validation mIoU 38.91 # 3
Test mIoU 35.77 # 2
Semantic Segmentation Matterport3D SFSS-MMSI (RGB Only) Validation mIoU 35.15 # 4
Test mIoU 31.3 # 4
Semantic Segmentation Stanford2D3D Panoramic SFSS-MMSI (RGB Only) mIoU 52.87% # 9
mAcc 63.96 # 9
Semantic Segmentation Stanford2D3D Panoramic SFSS-MMSI (RGB+Depth+Normal) mIoU 59.43% # 2
mAcc 69.03 # 2
Semantic Segmentation Stanford2D3D Panoramic SFSS-MMSI (RGB+HHA) mIoU 60.6% # 1
mAcc 70.68 # 1
Semantic Segmentation Stanford2D3D Panoramic SFSS-MMSI (RGB+Normal) mIoU 58.24% # 3
mAcc 68.79 # 3
Semantic Segmentation Stanford2D3D Panoramic SFSS-MMSI (RGB+Depth) mIoU 55.49% # 5
mAcc 68.57 # 4
Semantic Segmentation Structured3D SFSS-MMSI (RGB+Normal) Validation mIoU 74.38 # 2
Test mIoU 71 # 2
Semantic Segmentation Structured3D SFSS-MMSI (RGB+Depth) Validation mIoU 73.78 # 3
Test mIoU 70.17 # 3
Semantic Segmentation Structured3D SFSS-MMSI (RGB Only) Validation mIoU 71.94 # 4
Test mIoU 68.34 # 4
Semantic Segmentation Structured3D SFSS-MMSI (RGB+Depth+Normal) Validation mIoU 75.86 # 1
Test mIoU 71.97 # 1

Methods


No methods listed for this paper. Add relevant methods here