Effective Document Image Enhancement Using tokens-to-token Transformer Network

Document image enhancement is a fundamental and important stage for attaining the best performance in any document analysis assignment because there are many degradation situations that could harm document images, making it more difficult to recognize and analyze them. In this paper, we propose to employ a Tokens-to-Token Transformer network for document image enhancement, a novel encoder-decoder architecture based on a tokens-to-token vision transformer. The proposed architecture uses a tokens-to-token architecture in the encoder section. Each image is divided into a set of tokens with a defined length using the ViT model, which is then applied several times to model the global relationship between the tokens. However, the conventional tokenization of input data does not adequately reflect the crucial local structure between adjacent pixels of the input image, which results in low efficiency. Instead of using a simple ViT and hard splitting of images for the document image enhancement task, we employed a progressive tokeniza-tion technique to capture this local information from an image for achieving more effective results. Experiments on various DIBCO and H-DIBCO benchmarks demonstrate that the proposed model outperforms the existing CNN and ViT-based state-of-the-art methods. In this research, the primary area of examination is the application of the proposed architecture to the task of document binarization. The source code will be made available at https://github.com/RisabBiswas/T2T-BinFormer.

PDF

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Binarization DIBCO 2009 T2T-BinFormer F-Measure 96.20 # 1
Pseudo-F-measure 97.62 # 1
PSNR 22.04 # 1
DRD 0.22 # 1
Binarization DIBCO 2010 T2T-BinFormer PSNR 23.00 # 1
F-Measure 96.17 # 1
Pseudo-F-measure 97.67 # 1
DRD 0.22 # 1
Binarization DIBCO 2011 T2T-BinFormer PSNR 22.17 # 1
F-Measure 96.19 # 1
DRD 0.15 # 1
Pseudo-F-measure 97.63 # 2
Binarization DIBCO 2013 T2T-BinFormer F-Measure 97.10 # 1
Pseudo-F-measure 98.23 # 2
PSNR 23.99 # 1
DRD 0.07 # 1
Binarization DIBCO 2019 T2T-BinFormer F-Measure 65.70 # 3
Pseudo-F-measure 67.82 # 3
PSNR 14.49 # 3
DRD 0.29 # 1
Binarization H-DIBCO 2012 T2T-BinFormer PSNR 23.95 # 1
F-Measure 96.80 # 1
DRD 0.20 # 1
Pseudo-F-measure 98.04 # 1
Binarization H-DIBCO 2014 T2T-BinFormer F-Measure 97.50 # 2
Pseudo-F-measure 98.50 # 2
PSNR 23.48 # 2
DRD 0.21 # 1
Binarization H-DIBCO 2018 T2T-BinFormer PSNR 22.33 # 1
F-Measure 95.60 # 1
DRD 0.13 # 1
Pseudo-F-measure 96.97 # 2

Methods