no code implementations • 10 May 2024 • Khanh Nguyen, Dimosthenis Karatzas
We focus on the problem of Document VQA, a task particularly suited to this approach, as the type of reasoning capabilities required from the model can be quite different in diverse domains.
1 code implementation • 29 Apr 2024 • Lei Kang, Rubèn Tito, Ernest Valveny, Dimosthenis Karatzas
In particular, we employ a visual-only document representation, leveraging the encoder from a document understanding model, Pix2Struct.
1 code implementation • 29 Apr 2024 • Lei Kang, Mohamed Ali Souibgui, Fei Yang, Lluis Gomez, Ernest Valveny, Dimosthenis Karatzas
In our research, we explore machine unlearning for document classification problems, representing, to the best of our knowledge, the first investigation into this area.
no code implementations • 6 Mar 2024 • Emanuele Vivoli, Joan Lafuente Baeza, Ernest Valveny Llobet, Dimosthenis Karatzas
This work explores a closure task in comics, a medium where visual and textual elements are intricately intertwined.
no code implementations • 15 Dec 2023 • Rubèn Tito, Khanh Nguyen, Marlon Tobaben, Raouf Kerkouche, Mohamed Ali Souibgui, Kangsoo Jung, Lei Kang, Ernest Valveny, Antti Honkela, Mario Fritz, Dimosthenis Karatzas
We employ a federated learning scheme, that reflects the real-life distribution of documents in different businesses, and we explore the use case where the ID of the invoice issuer is the sensitive information to be protected.
1 code implementation • 5 Sep 2023 • Sergi Garcia-Bordils, Dimosthenis Karatzas, Marçal Rusiñol
We introduce the structured scene-text spotting task, which requires a scene-text OCR system to spot text in the wild according to a query regular expression.
no code implementations • 4 Sep 2023 • Soumya Jahagirdar, Minesh Mathew, Dimosthenis Karatzas, C. V. Jawahar
Researchers have extensively studied the field of vision and language, discovering that both visual and textual content is crucial for understanding scenes effectively.
no code implementations • 8 Jul 2023 • George Tom, Minesh Mathew, Sergi Garcia, Dimosthenis Karatzas, C. V. Jawahar
Text and signs around roads provide crucial information for drivers, vital for safe navigation and situational awareness.
no code implementations • 5 Jun 2023 • Wenwen Yu, Chengquan Zhang, Haoyu Cao, Wei Hua, Bohan Li, Huang Chen, MingYu Liu, Mingrui Chen, Jianfeng Kuang, Mengjun Cheng, Yuning Du, Shikun Feng, Xiaoguang Hu, Pengyuan Lyu, Kun Yao, Yuechen Yu, Yuliang Liu, Wanxiang Che, Errui Ding, Cheng-Lin Liu, Jiebo Luo, Shuicheng Yan, Min Zhang, Dimosthenis Karatzas, Xing Sun, Jingdong Wang, Xiang Bai
It is hoped that this competition will attract many researchers in the field of CV and NLP, and bring some new thoughts to the field of Document AI.
no code implementations • 24 Apr 2023 • Wenwen Yu, MingYu Liu, Mingrui Chen, Ning Lu, Yinlong Wen, Yuliang Liu, Dimosthenis Karatzas, Xiang Bai
To promote research in this area, we organized ICDAR 2023 competition on reading the seal title (ReST), which included two tasks: seal title text detection (Task 1) and end-to-end seal title recognition (Task 2).
no code implementations • 10 Apr 2023 • Weijia Wu, Yuzhong Zhao, Zhuang Li, Jiahong Li, Mike Zheng Shou, Umapada Pal, Dimosthenis Karatzas, Xiang Bai
In this competition report, we establish a video text reading benchmark, DSText, which focuses on dense and small text reading challenges in the video with various scenarios.
1 code implementation • 11 Feb 2023 • Štěpán Šimsa, Milan Šulc, Michal Uřičář, Yash Patel, Ahmed Hamdi, Matěj Kocián, Matyáš Skalický, Jiří Matas, Antoine Doucet, Mickaël Coustaty, Dimosthenis Karatzas
This paper introduces the DocILE benchmark with the largest dataset of business documents for the tasks of Key Information Localization and Extraction and Line Item Recognition.
1 code implementation • 7 Dec 2022 • Rubèn Tito, Dimosthenis Karatzas, Ernest Valveny
The proposed method is based on a hierarchical transformer architecture where the encoder summarizes the most relevant information of every page and then, the decoder takes this summarized information to generate the final answer.
no code implementations • 10 Nov 2022 • Soumya Jahagirdar, Minesh Mathew, Dimosthenis Karatzas, C. V. Jawahar
We demonstrate the limitations of current Scene Text VQA and VideoQA methods and propose ways to incorporate scene text information into VideoQA methods.
no code implementations • 21 Sep 2022 • Khanh Nguyen, Ali Furkan Biten, Andres Mafla, Lluis Gomez, Dimosthenis Karatzas
Particularly, a similar Wikimedia image can be used to illustrate different articles, and the produced caption needs to be adapted to a specific context, therefore allowing us to explore the limits of a model to adjust captions to different contextual information.
no code implementations • 14 Sep 2022 • Emanuele Vivoli, Ali Furkan Biten, Andres Mafla, Dimosthenis Karatzas, Lluis Gomez
In this paper, we present a framework for Multilingual Scene Text Visual Question Answering that deals with new languages in a zero-shot fashion.
no code implementations • 14 Sep 2022 • Sergi Garcia-Bordils, Andrés Mafla, Ali Furkan Biten, Oren Nuriel, Aviad Aberdam, Shai Mazor, Ron Litman, Dimosthenis Karatzas
This paper presents final results of the Out-Of-Vocabulary 2022 (OOV) challenge.
Optical Character Recognition Optical Character Recognition (OCR) +1
1 code implementation • 9 Mar 2022 • Mohamed Ali Souibgui, Sanket Biswas, Andres Mafla, Ali Furkan Biten, Alicia Fornés, Yousri Kessentini, Josep Lladós, Lluis Gomez, Dimosthenis Karatzas
In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement.
1 code implementation • 25 Feb 2022 • Ali Furkan Biten, Rubèn Tito, Lluis Gomez, Ernest Valveny, Dimosthenis Karatzas
It is our hope that OCR-IDL can be a starting point for future works on Document Intelligence.
no code implementations • 10 Nov 2021 • Rubèn Tito, Minesh Mathew, C. V. Jawahar, Ernest Valveny, Dimosthenis Karatzas
In this report we present results of the ICDAR 2021 edition of the Document Visual Question Challenges.
no code implementations • 6 Oct 2021 • Ali Furkan Biten, Andres Mafla, Lluis Gomez, Dimosthenis Karatzas
In this work, we propose two metrics that evaluate the degree of semantic relevance of retrieved items, independently of their annotated binary relevance.
1 code implementation • 4 Oct 2021 • Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas
Explaining an image with missing or non-existent objects is known as object bias (hallucination) in image captioning.
no code implementations • 2 Oct 2021 • Minesh Mathew, Lluis Gomez, Dimosthenis Karatzas, CV Jawahar
This work addresses the problem of Question Answering (QA) on handwritten document collections.
no code implementations • 11 May 2021 • Mohamed Ali Souibgui, Ali Furkan Biten, Sounak Dey, Alicia Fornés, Yousri Kessentini, Lluis Gomez, Dimosthenis Karatzas, Josep Lladós
Low resource Handwritten Text Recognition (HTR) is a hard problem due to the scarce annotated data and the very limited linguistic information (dictionaries and language models).
no code implementations • 27 Apr 2021 • Rubèn Tito, Dimosthenis Karatzas, Ernest Valveny
Current tasks and methods in Document Understanding aims to process documents as single elements.
no code implementations • 26 Apr 2021 • Minesh Mathew, Viraj Bagal, Rubèn Pérez Tito, Dimosthenis Karatzas, Ernest Valveny, C. V Jawahar
Infographics are documents designed to effectively communicate information using a combination of textual, graphical and visual elements.
1 code implementation • 18 Mar 2021 • Zheng Huang, Kai Chen, Jianhua He, Xiang Bai, Dimosthenis Karatzas, Shjian Lu, C. V. Jawahar
In this competition, we set up three tasks, namely, Scanned Receipt Text Localisation (Task 1), Scanned Receipt OCR (Task 2) and Key Information Extraction from Scanned Receipts (Task 3).
Key Information Extraction Optical Character Recognition (OCR) +1
1 code implementation • 8 Dec 2020 • Andrés Mafla, Rafael Sampaio de Rezende, Lluís Gómez, Diane Larlus, Dimosthenis Karatzas
Then, armed with this dataset, we describe several approaches which leverage scene text, including a better scene-text aware cross-modal retrieval method which uses specialized representations for text from the captions and text from the visual scene, and reconcile them in a common embedding space.
1 code implementation • 21 Sep 2020 • Andres Mafla, Sounak Dey, Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas
Scene text instances found in natural images carry explicit semantic information that can provide important cues to solve a wide array of computer vision problems.
no code implementations • 20 Aug 2020 • Minesh Mathew, Ruben Tito, Dimosthenis Karatzas, R. Manmatha, C. V. Jawahar
For the task 1 a new dataset is introduced comprising 50, 000 questions-answer(s) pairs defined over 12, 767 document images.
1 code implementation • 11 Aug 2020 • Raul Gomez, Yahui Liu, Marco De Nadai, Dimosthenis Karatzas, Bruno Lepri, Nicu Sebe
In this paper we propose the use of an image retrieval system to assist the image-to-image translation task.
no code implementations • ECCV 2020 • Raul Gomez, Jaume Gibert, Lluis Gomez, Dimosthenis Karatzas
People from different parts of the globe describe objects and concepts in distinct manners.
no code implementations • 6 Jul 2020 • Klára Janoušková, Jiri Matas, Lluis Gomez, Dimosthenis Karatzas
We present a method for exploiting weakly annotated images to improve text extraction pipelines.
3 code implementations • 1 Jul 2020 • Minesh Mathew, Dimosthenis Karatzas, C. V. Jawahar
The dataset consists of 50, 000 questions defined on 12, 000+ document images.
Ranked #1 on Visual Question Answering (VQA) on DocVQA val
no code implementations • 1 Jun 2020 • Lluís Gómez, Ali Furkan Biten, Rubèn Tito, Andrés Mafla, Marçal Rusiñol, Ernest Valveny, Dimosthenis Karatzas
This paper presents a new model for the task of scene text visual question answering, in which questions about a given image can only be answered by reading and understanding scene text that is present in it.
2 code implementations • 14 Jan 2020 • Andres Mafla, Sounak Dey, Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas
Text contained in an image carries high-level semantics that can be exploited to achieve richer image understanding.
Ranked #1 on Fine-Grained Image Classification on Con-Text
no code implementations • 20 Dec 2019 • Xi Liu, Rui Zhang, Yongsheng Zhou, Qianyi Jiang, Qi Song, Nan Li, Kai Zhou, Lei Wang, Dong Wang, Minghui Liao, Mingkun Yang, Xiang Bai, Baoguang Shi, Dimosthenis Karatzas, Shijian Lu, C. V. Jawahar
21 teams submit results for Task 1, 23 teams submit results for Task 2, 24 teams submit results for Task 3, and 13 teams submit results for Task 4.
1 code implementation • 9 Oct 2019 • Raul Gomez, Jaume Gibert, Lluis Gomez, Dimosthenis Karatzas
In this work we target the problem of hate speech detection in multimodal publications formed by a text and an image.
no code implementations • 17 Sep 2019 • Yipeng Sun, Zihan Ni, Chee-Kheng Chng, Yuliang Liu, Canjie Luo, Chun Chet Ng, Junyu Han, Errui Ding, Jingtuo Liu, Dimosthenis Karatzas, Chee Seng Chan, Lianwen Jin
Robust text reading from street view images provides valuable information for various applications.
1 code implementation • 16 Sep 2019 • Chee-Kheng Chng, Yuliang Liu, Yipeng Sun, Chun Chet Ng, Canjie Luo, Zihan Ni, ChuanMing Fang, Shuaitao Zhang, Junyu Han, Errui Ding, Jingtuo Liu, Dimosthenis Karatzas, Chee Seng Chan, Lianwen Jin
This paper reports the ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT) that consists of three major challenges: i) scene text detection, ii) scene text recognition, and iii) scene text spotting.
no code implementations • 1 Jul 2019 • Nibal Nayef, Yash Patel, Michal Busta, Pinaki Nath Chowdhury, Dimosthenis Karatzas, Wafa Khlif, Jiri Matas, Umapada Pal, Jean-Christophe Burie, Cheng-Lin Liu, Jean-Marc Ogier
With the growing cosmopolitan culture of modern cities, the need of robust Multi-Lingual scene Text (MLT) detection and recognition systems has never been more immense.
Cultural Vocal Bursts Intensity Prediction General Classification +2
no code implementations • 30 Jun 2019 • Ali Furkan Biten, Rubèn Tito, Andres Mafla, Lluis Gomez, Marçal Rusiñol, Minesh Mathew, C. V. Jawahar, Ernest Valveny, Dimosthenis Karatzas
ST-VQA introduces an important aspect that is not addressed by any Visual Question Answering system up to date, namely the incorporation of scene text to answer questions asked about an image.
1 code implementation • 4 Jun 2019 • Raul Gomez, Ali Furkan Biten, Lluis Gomez, Jaume Gibert, Marçal Rusiñol, Dimosthenis Karatzas
This paper explores the possibilities of image style transfer applied to text maintaining the original transcriptions.
3 code implementations • ICCV 2019 • Ali Furkan Biten, Ruben Tito, Andres Mafla, Lluis Gomez, Marçal Rusiñol, Ernest Valveny, C. V. Jawahar, Dimosthenis Karatzas
Current visual question answering datasets do not consider the rich semantic information conveyed by text within an image.
1 code implementation • CVPR 2019 • Ali Furkan Biten, Lluis Gomez, Marçal Rusiñol, Dimosthenis Karatzas
We propose a novel captioning method that is able to leverage contextual information provided by the text of news articles associated with an image.
no code implementations • 31 Jan 2019 • Yash Patel, Lluis Gomez, Marçal Rusiñol, Dimosthenis Karatzas, C. V. Jawahar
Cross-modal retrieval methods have been significantly improved in last years with the use of deep neural networks and large-scale annotated datasets such as ImageNet and Places.
1 code implementation • 7 Jan 2019 • Raul Gomez, Lluis Gomez, Jaume Gibert, Dimosthenis Karatzas
In this work we propose to exploit this free available data to learn a multimodal image and text embedding, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval.
1 code implementation • 4 Sep 2018 • Dena Bazazian, Dimosthenis Karatzas, Andrew D. Bagdanov
In this paper we propose a technique to create and exploit an intermediate representation of images based on text attributes which are character probability maps.
3 code implementations • ECCV 2018 • Lluís Gómez, Andrés Mafla, Marçal Rusiñol, Dimosthenis Karatzas
In this way, the text based image retrieval task can be casted as a simple nearest neighbor search of the query text representation over the outputs of the CNN over the entire image database.
1 code implementation • 20 Aug 2018 • Raul Gomez, Lluis Gomez, Jaume Gibert, Dimosthenis Karatzas
We perform a language separate treatment of the data and show that it can be extrapolated to a tourists and locals separate analysis, and that tourism is reflected in Social Media at a neighborhood level.
1 code implementation • 20 Aug 2018 • Raul Gomez, Lluis Gomez, Jaume Gibert, Dimosthenis Karatzas
In this paper we propose to learn a multimodal image and text embedding from Web and Social Media data, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval.
1 code implementation • 4 Jul 2018 • Yash Patel, Lluis Gomez, Raul Gomez, Marçal Rusiñol, Dimosthenis Karatzas, C. V. Jawahar
We show that adequate visual features can be learned efficiently by training a CNN to predict the semantic textual context in which a particular image is more probable to appear as an illustration.
no code implementations • 19 Jun 2018 • Anguelos Nicolaou, Sounak Dey, Vincent Christlein, Andreas Maier, Dimosthenis Karatzas
Embedding data into vector spaces is a very popular strategy of pattern recognition methods.
no code implementations • 18 Oct 2017 • Dimosthenis Karatzas, Lluis Gómez, Anguelos Nicolaou, Marçal Rusiñol
The ICDAR Robust Reading Competition (RRC), initiated in 2003 and re-established in 2011, has become a de-facto evaluation standard for robust reading systems and algorithms.
no code implementations • CVPR 2017 • Lluis Gomez, Yash Patel, Marçal Rusiñol, Dimosthenis Karatzas, C. V. Jawahar
End-to-end training from scratch of current deep architectures for new computer vision problems would require Imagenet-scale datasets, and this is not always possible.
1 code implementation • 16 Feb 2017 • Dena Bazazian, Raul Gomez, Anguelos Nicolaou, Lluis Gomez, Dimosthenis Karatzas, Andrew D. Bagdanov
Text Proposals have emerged as a class-dependent version of object proposals - efficient approaches to reduce the search space of possible text object locations in an image.
1 code implementation • 10 Apr 2016 • Lluis Gomez-Bigorda, Dimosthenis Karatzas
Motivated by the success of powerful while expensive techniques to recognize words in a holistic way, object proposals techniques emerge as an alternative to the traditional text detectors.
1 code implementation • 24 Feb 2016 • Lluis Gomez, Anguelos Nicolaou, Dimosthenis Karatzas
Instead of resizing input images to a fixed aspect ratio as in the typical use of holistic CNN classifiers, we propose here a patch-based classification framework in order to preserve discriminative parts of the image that are characteristic of its class.
no code implementations • 24 Feb 2016 • Lluis Gomez, Dimosthenis Karatzas
Although widely studied for document images and handwritten documents, it remains an almost unexplored territory for scene text images.
no code implementations • 8 Jan 2016 • Anguelos Nicolaou, Andrew Bagdanov, Lluis Gomez-Bigorda, Dimosthenis Karatzas
In this paper we introduce a script identification method based on hand-crafted texture features and an artificial neural network.
1 code implementation • 8 Sep 2015 • Lluis Gomez, Dimosthenis Karatzas
The use of Object Proposals techniques in the scene text understanding field is innovative.
no code implementations • 23 Apr 2015 • Anguelos Nicolaou, Andrew D. Bagdanov, Marcus Liwicki, Dimosthenis Karatzas
In this paper we present the use of Sparse Radial Sampling Local Binary Patterns, a variant of Local Binary Patterns (LBP) for text-as-texture classification.
no code implementations • 28 Jul 2014 • Lluis Gomez, Dimosthenis Karatzas
Typography and layout lead to the hierarchical organisation of text in words, text lines, paragraphs.