AutoSAM: Adapting SAM to Medical Images by Overloading the Prompt Encoder
The recently introduced Segment Anything Model (SAM) combines a clever architecture and large quantities of training data to obtain remarkable image segmentation capabilities. However, it fails to reproduce such results for Out-Of-Distribution (OOD) domains such as medical images. Moreover, while SAM is conditioned on either a mask or a set of points, it may be desirable to have a fully automatic solution. In this work, we replace SAM's conditioning with an encoder that operates on the same input image. By adding this encoder and without further fine-tuning SAM, we obtain state-of-the-art results on multiple medical images and video benchmarks. This new encoder is trained via gradients provided by a frozen SAM. For inspecting the knowledge within it, and providing a lightweight segmentation solution, we also learn to decode it into a mask by a shallow deconvolution network.
PDF AbstractResults from the Paper
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Video Polyp Segmentation | SUN-SEG-Easy (Unseen) | AutoSAM | S measure | 0.815 | # 1 | |
mean E-measure | 0.855 | # 1 | ||||
weighted F-measure | 0.716 | # 1 | ||||
mean F-measure | 0.774 | # 1 | ||||
Dice | 0.753 | # 2 | ||||
Sensitivity | 0.672 | # 1 | ||||
Video Polyp Segmentation | SUN-SEG-Hard (Unseen) | AutoSAM | S-Measure | 0.822 | # 1 | |
mean E-measure | 0.866 | # 1 | ||||
weighted F-measure | 0.714 | # 1 | ||||
mean F-measure | 0.764 | # 1 | ||||
Dice | 0.759 | # 1 | ||||
Sensitivity | 0.726 | # 1 |