SPAN: Spatial Pyramid Attention Network for Image Manipulation Localization

Tehchniques for manipulating images are advancing rapidly; while these are helpful for many useful tasks, they also pose a threat to society with their ability to create believable misinformation. We present a novel, Spatial Pyramid Attention Network (SPAN) for detection and localization of multiple types of image manipulations. The proposed architecture efficiently and effecively models the relationship between image patches at multiple scales by constructing a pyramid of local self-attention blocks. The design includes a novel position projection to encode the spatial positions of the patches. SPAN is trained on a synthetic dataset but can also be fine tuned for specific datasets; The proposed method shows significant gains in performance on standard datasets over previous state-of-art methods.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Image Manipulation Localization Casia V1+ SPAN Average Pixel F1(Fixed threshold) .112 # 8
Image Manipulation Detection Casia V1+ SPAN AUC .480 # 8
Balanced Accuracy .112 # 8
Image Manipulation Localization CocoGlide SPAN Average Pixel F1(Fixed threshold) .298 # 8
Image Manipulation Detection CocoGlide SPAN AUC .475 # 8
Balanced Accuracy .298 # 7
Image Manipulation Localization Columbia SPAN Average Pixel F1(Fixed threshold) .759 # 5
Image Manipulation Localization COVERAGE SPAN Average Pixel F1(Fixed threshold) .235 # 8
Image Manipulation Detection COVERAGE SPAN AUC .670 # 7
Balanced Accuracy .235 # 8
Image Manipulation Localization DSO-1 SPAN Average Pixel F1(Fixed threshold) .233 # 8
Image Manipulation Detection DSO-1 SPAN AUC .669 # 6
Balanced Accuracy .233 # 8

Methods


No methods listed for this paper. Add relevant methods here