CrossMoCo: Multi-modal Momentum Contrastive Learning for Point Cloud

The point cloud is a 3D geometric data that lacks a specific structure and is permutation-invariant. The applications of point clouds have gained significant attention recently in the field of vision tasks. However, most existing works on point clouds utilize supervised learning on large labelled data, which are costly and laborious to collect. To this end, unsupervised learning, for example, self-supervised learning, has shown promising performance in various tasks of 2D computer vision and holds the potential in 3D computer vision applications. In this study, we introduce a novel selfsupervised method called CrossMoCo, which learns the representations of unlabelled point cloud data in a multi-modal setup that also utilizes the 2D rendered images of the point clouds. CrossMoCo outperforms existing methods on multimodal self-supervised learning on point cloud by introducing two new concepts: momentum contrastive learning with more negative samples and multiple-view intra-modal contrastive learning. The first component learns from an online encoder and a momentum encoder with a large number of negative samples, which provides consistent learning signals. The second component enforces consistency between different views of the samples of the same modality, thereby improving multimodal representation. We conduct extensive studies on two popular benchmark datasets (ModelNet40 and ScanObjectNN) for linear classification and few-shot learning tasks. Our results demonstrate that CrossMoCo achieves superior performance over existing methods for both tasks on both datasets, achieving up to 4.36% improvement on linear classification and up to 9.2% on few-shot tasks. Our code is available at https://github.com/snehaputul/CrossMoCo.

PDF
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
3D Point Cloud Linear Classification ModelNet40 CrossMoCo Overall Accuracy 91.49 # 6
3D Point Cloud Classification ModelNet40 CrossMoCo Overall Accuracy 91.49 # 90
3D Object Classification ModelNet40 CrossMoCo Classification Accuracy 91.49 # 3
Few-Shot 3D Point Cloud Classification ModelNet40 10-way (10-shot) CrossMoCo Overall Accuracy 88.7 # 15
Standard Deviation 3.9 # 8
Few-Shot 3D Point Cloud Classification ModelNet40 10-way (20-shot) CrossMoCo Overall Accuracy 91.0 # 15
Standard Deviation 3.4 # 17
Few-Shot 3D Point Cloud Classification ModelNet40 5-way (10-shot) CrossMoCo Overall Accuracy 93.8 # 14
Standard Deviation 4.5 # 19
Few-Shot 3D Point Cloud Classification ModelNet40 5-way (20-shot) CrossMoCo Overall Accuracy 96.8 # 13
Standard Deviation 1.7 # 13
3D Point Cloud Linear Classification ScanObjectNN CrossMoCo Overall Accuracy 86.06 # 1
Few-Shot 3D Point Cloud Classification ScanObjectNN 10-way (10-shot) CrossMoCo Overall Accuracy 69.6 # 1
Few-Shot 3D Point Cloud Classification ScanObjectNN 10-way (20-shot) CrossMoCo Overall Accuracy 78.1 # 1
Few-Shot 3D Point Cloud Classification ScanObjectNN 5-way (10-shot) CrossMoCo Overall Accuracy 84.0 # 1
Few-Shot 3D Point Cloud Classification ScanObjectNN 5-way (20-shot) CrossMoCo Overall Accuracy 87.6 # 1

Methods