Document Shadow Removal with Foreground Detection Learning From Fully Synthetic Images

2022 2022 · Yuhi Matsuo, Naofumi Akimoto, Yoshimitsu Aoki ·

Shadow removal for document images is a major task for digitized document applications. Recent shadow removal models have been trained on pairs of shadow images and shadow-free images. However, obtaining a large-scale and diverse dataset is laborious and remains a great challenge. Thus, only small real datasets are available. To create relatively large datasets, a graphic renderer has been used to synthesize shadows, nonetheless, it is still necessary to capture real documents. Thus, the number of unique documents is limited, which negatively affects a network’s performance. In this paper, we present a large-scale and diverse dataset called fully synthetic document shadow removal dataset (FSDSRD) that does not require capturing documents. The experiments showed that the networks (pre-)trained on FSDSRD provided better results than networks trained only on real datasets. Additionally, because foreground maps are available in our dataset, we leveraged them during training for multitask learning, which provided noticeable improvements. The code is available at: https://github.com/IsHYuhi/DSRFGD.

PDF Abstract