no code implementations • 20 Nov 2023 • Mahmoud Limam, Marwa Dhiaf, Yousri Kessentini
In this paper, we introduce FATURA, a pivotal resource for researchers in the field of document analysis and understanding.
no code implementations • 24 Mar 2023 • Marwa Dhiaf, Ahmed Cheikh Rouhou, Yousri Kessentini, Sinda Ben Salem
In this paper, we propose a lite transformer architecture for full-page multi-script handwriting recognition.
no code implementations • 16 Mar 2023 • Marwa Dhiaf, Mohamed Ali Souibgui, Kai Wang, Yuyang Liu, Yousri Kessentini, Alicia Fornés, Ahmed Cheikh Rouhou
In this paper, we explore the potential of continual self-supervised learning to alleviate the catastrophic forgetting problem in handwritten text recognition, as an example of sequence recognition.
no code implementations • 6 Mar 2023 • Sana Khamekhem Jemni, Sourour Ammar, Mohamed Ali Souibgui, Yousri Kessentini, Abbas Cheddad
However, in the case of historical manuscripts, there is a lack of annotated corpus for training.
1 code implementation • 9 Mar 2022 • Mohamed Ali Souibgui, Sanket Biswas, Andres Mafla, Ali Furkan Biten, Alicia Fornés, Yousri Kessentini, Josep Lladós, Lluis Gomez, Dimosthenis Karatzas
In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement.
1 code implementation • 25 Jan 2022 • Mohamed Ali Souibgui, Sanket Biswas, Sana Khamekhem Jemni, Yousri Kessentini, Alicia Fornés, Josep Lladós, Umapada Pal
Document images can be affected by many degradation scenarios, which cause recognition and processing difficulties.
Ranked #1 on Binarization on H-DIBCO 2011
no code implementations • 8 Dec 2021 • Ahmed Cheikh Rouhoua, Marwa Dhiaf, Yousri Kessentini, Sinda Ben Salem
Second, it allows the model to exploit larger bi-dimensional context information to identify the semantic categories, reaching a higher final prediction accuracy.
1 code implementation • 21 Jul 2021 • Mohamed Ali Souibgui, Alicia Fornés, Yousri Kessentini, Beáta Megyesi
Since this retraining would require annotation of thousands of handwritten symbols together with their bounding boxes, we propose to avoid such human effort through an unsupervised progressive learning approach that automatically assigns pseudo-labels to the non-annotated data.
1 code implementation • 26 May 2021 • Sana Khamekhem Jemni, Mohamed Ali Souibgui, Yousri Kessentini, Alicia Fornés
Unlike the most well-known document binarization methods, which try to improve the visual quality of the degraded document, the proposed architecture integrates a handwritten text recognizer that promotes the generated document image to be more readable.
Ranked #1 on Binarization on H-DIBCO 2016
no code implementations • 11 May 2021 • Mohamed Ali Souibgui, Ali Furkan Biten, Sounak Dey, Alicia Fornés, Yousri Kessentini, Lluis Gomez, Dimosthenis Karatzas, Josep Lladós
Low resource Handwritten Text Recognition (HTR) is a hard problem due to the scarce annotated data and the very limited linguistic information (dictionaries and language models).
4 code implementations • 17 Oct 2020 • Mohamed Ali Souibgui, Yousri Kessentini
Documents often exhibit various forms of degradation, which make it hard to be read and substantially deteriorate the performance of an OCR system.
1 code implementation • 26 Sep 2020 • Mohamed Ali Souibgui, Alicia Fornés, Yousri Kessentini, Crina Tudor
Encoded (or ciphered) manuscripts are a special type of historical documents that contain encrypted text.