K-means for unsupervised instance segmentation using a self-supervised transformer

Instance segmentation is a fundamental task in computer vision that assigns every pixel to an appropriate class and localizes objects into bounding boxes. However, collecting pixel-level segmentation labels is more resource- and time-consuming than collecting classification and detection labels. Herein, we present a novel approach, iterative mask refinement using a self-supervised transformer (IMST), which performs class agnostic unsupervised instance segmentation using simple K-means clustering and a self-supervised vision transformer. IMST generates pseudo-ground-truth labels that can be used to train an off-the-shelf instance segmentation model. The pseudo labels demonstrate improved performance on multiple datasets. The instance segmentation model trained on the pseudo labels outperforms state-of-the-art unsupervised instance segmentation methods on COCO20k (+4.0 average precision (AP)) and COCO val2017(+2.6 AP) without modifications to the training loss or architecture. We demonstrate that our method can be extended to tasks such as single/multiple object discovery and supervised fine-tuning instance segmentation while outperforming previous methods.

PDF Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Single-object discovery COCO_20k IMST CorLoc 72.2 # 1

Methods