Exploring Multi-Modal Fusion for Image Manipulation Detection and Localization

4 Dec 2023  ยท  Konstantinos Triaridis, Vasileios Mezaris ยท

Recent image manipulation localization and detection techniques usually leverage forensic artifacts and traces that are produced by a noise-sensitive filter, such as SRM and Bayar convolution. In this paper, we showcase that different filters commonly used in such approaches excel at unveiling different types of manipulations and provide complementary forensic traces. Thus, we explore ways of merging the outputs of such filters and aim to leverage the complementary nature of the artifacts produced to perform image manipulation localization and detection (IMLD). We propose two distinct methods: one that produces independent features from each forensic filter and then fuses them (this is referred to as late fusion) and one that performs early mixing of different modal outputs and produces early combined features (this is referred to as early fusion). We demonstrate that both approaches achieve competitive performance for both image manipulation localization and detection, outperforming state-of-the-art models across several datasets.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Image Manipulation Localization Casia V1+ Early Fusion Average Pixel F1(Fixed threshold) .784 # 1
Image Manipulation Detection Casia V1+ Late Fusion AUC .930 # 3
Balanced Accuracy .860 # 1
Image Manipulation Detection Casia V1+ Early Fusion AUC .929 # 4
Balanced Accuracy .845 # 2
Image Manipulation Localization Casia V1+ Late Fusion Average Pixel F1(Fixed threshold) .775 # 2
Image Manipulation Detection CocoGlide Early Fusion AUC .755 # 3
Balanced Accuracy .660 # 2
Image Manipulation Detection CocoGlide Late Fusion AUC .760 # 2
Balanced Accuracy .677 # 1
Image Manipulation Localization CocoGlide Late Fusion Average Pixel F1(Fixed threshold) .574 # 1
Image Manipulation Localization CocoGlide Early Fusion Average Pixel F1(Fixed threshold) .553 # 2
Image Manipulation Detection Columbia Late Fusion AUC .977 # 4
Balanced Accuracy .822 # 3
Image Manipulation Localization Columbia Early Fusion Average Pixel F1(Fixed threshold) .888 # 1
Image Manipulation Localization Columbia Late Fusion Average Pixel F1(Fixed threshold) .864 # 2
Image Manipulation Detection Columbia Early Fusion AUC .996 # 1
Balanced Accuracy .962 # 2
Image Manipulation Localization COVERAGE Early Fusion Average Pixel F1(Fixed threshold) .663 # 1
Image Manipulation Detection COVERAGE Late Fusion AUC .792 # 2
Balanced Accuracy .720 # 2
Image Manipulation Detection COVERAGE Early Fusion AUC .839 # 1
Balanced Accuracy .770 # 1
Image Manipulation Localization COVERAGE Late Fusion Average Pixel F1(Fixed threshold) .641 # 2
Image Manipulation Localization DSO-1 Late Fusion Average Pixel F1(Fixed threshold) .899 # 2
Image Manipulation Detection DSO-1 Late Fusion AUC .958 # 3
Balanced Accuracy .830 # 3
Image Manipulation Detection DSO-1 Early Fusion AUC .966 # 2
Balanced Accuracy .935 # 1
Image Manipulation Localization DSO-1 Early Fusion Average Pixel F1(Fixed threshold) .869 # 3

Methods


No methods listed for this paper. Add relevant methods here