NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document Enhancement

8 Apr 2024  ยท  Giordano Cicchetti, Danilo Comminiello ยท

Real-world documents may suffer various forms of degradation, often resulting in lower accuracy in optical character recognition (OCR) systems. Therefore, a crucial preprocessing step is essential to eliminate noise while preserving text and key features of documents. In this paper, we propose NAF-DPM, a novel generative framework based on a diffusion probabilistic model (DPM) designed to restore the original quality of degraded documents. While DPMs are recognized for their high-quality generated images, they are also known for their large inference time. To mitigate this problem we provide the DPM with an efficient nonlinear activation-free (NAF) network and we employ as a sampler a fast solver of ordinary differential equations, which can converge in a few iterations. To better preserve text characters, we introduce an additional differentiable module based on convolutional recurrent neural networks, simulating the behavior of an OCR system during training. Experiments conducted on various datasets showcase the superiority of our approach, achieving state-of-the-art performance in terms of pixel-level and perceptual similarity metrics. Furthermore, the results demonstrate a notable character error reduction made by OCR systems when transcribing real-world document images enhanced by our framework. Code and pre-trained models are available at https://github.com/ispamm/NAF-DPM.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Binarization DIBCO 2017 NAF-DPM F-Measure 93.55 # 2
PSNR 19.4 # 2
Pseudo-F-measure 95.76 # 2
Binarization DIBCO 2019 NAF-DPM F-Measure 74.61 # 1
Pseudo-F-measure 76.25 # 1
PSNR 15.39 # 1
Binarization H-DIBCO 2018 NAF-DPM PSNR 19.67 # 5
F-Measure 90.64 # 5
Pseudo-F-measure 94.51 # 4

Methods