Using Diffusion Models to Generate Synthetic Labelled Data for Medical Image Segmentation

25 Oct 2023  ·  Daniel Saragih, Pascal Tyrrell ·

In this paper, we proposed and evaluated a pipeline for generating synthetic labeled polyp images with the aim of augmenting automatic medical image segmentation models. In doing so, we explored the use of diffusion models to generate and style synthetic labeled data. The HyperKvasir dataset consisting of 1000 images of polyps in the human GI tract obtained from 2008 to 2016 during clinical endoscopies was used for training and testing. Furthermore, we did a qualitative expert review, and computed the Fr\'echet Inception Distance (FID) and Multi-Scale Structural Similarity (MS-SSIM) between the output images and the source images to evaluate our samples. To evaluate its augmentation potential, a segmentation model was trained with the synthetic data to compare their performance with the real data and previous Generative Adversarial Networks (GAN) methods. These models were evaluated using the Dice loss (DL) and Intersection over Union (IoU) score. Our pipeline generated images that more closely resembled real images according to the FID scores (GAN: $118.37 \pm 1.06 \text{ vs SD: } 65.99 \pm 0.37$). Improvements over GAN methods were seen on average when the segmenter was entirely trained (DL difference: $-0.0880 \pm 0.0170$, IoU difference: $0.0993 \pm 0.01493$) or augmented (DL difference: GAN $-0.1140 \pm 0.0900 \text{ vs SD }-0.1053 \pm 0.0981$, IoU difference: GAN $0.01533 \pm 0.03831 \text{ vs SD }0.0255 \pm 0.0454$) with synthetic data. Overall, we obtained more realistic synthetic images and improved segmentation model performance when fully or partially trained on synthetic data.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods