Search Results for author: Dimosthenis Karatzas

Found 63 papers, 29 papers with code

Federated Document Visual Question Answering: A Pilot Study

no code implementations • 10 May 2024 • Khanh Nguyen, Dimosthenis Karatzas

We focus on the problem of Document VQA, a task particularly suited to this approach, as the type of reasoning capabilities required from the model can be quite different in diverse domains.

Paper
Add Code

Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism

1 code implementation • 29 Apr 2024 • Lei Kang, Rubèn Tito, Ernest Valveny, Dimosthenis Karatzas

In particular, we employ a visual-only document representation, leveraging the encoder from a document understanding model, Pix2Struct.

document understanding Optical Character Recognition +3

Paper
Code

Machine Unlearning for Document Classification

1 code implementation • 29 Apr 2024 • Lei Kang, Mohamed Ali Souibgui, Fei Yang, Lluis Gomez, Ernest Valveny, Dimosthenis Karatzas

In our research, we explore machine unlearning for document classification problems, representing, to the best of our knowledge, the first investigation into this area.

Classification Document Classification +2

Paper
Code

Multimodal Transformer for Comics Text-Cloze

no code implementations • 6 Mar 2024 • Emanuele Vivoli, Joan Lafuente Baeza, Ernest Valveny Llobet, Dimosthenis Karatzas

This work explores a closure task in comics, a medium where visual and textual elements are intricately intertwined.

Language Modelling Large Language Model +1

Paper
Add Code

Privacy-Aware Document Visual Question Answering

no code implementations • 15 Dec 2023 • Rubèn Tito, Khanh Nguyen, Marlon Tobaben, Raouf Kerkouche, Mohamed Ali Souibgui, Kangsoo Jung, Lei Kang, Ernest Valveny, Antti Honkela, Mario Fritz, Dimosthenis Karatzas

We employ a federated learning scheme, that reflects the real-life distribution of documents in different businesses, and we explore the use case where the ID of the invoice issuer is the sensitive information to be protected.

document understanding Federated Learning +3

Paper
Add Code

STEP -- Towards Structured Scene-Text Spotting

1 code implementation • 5 Sep 2023 • Sergi Garcia-Bordils, Dimosthenis Karatzas, Marçal Rusiñol

We introduce the structured scene-text spotting task, which requires a scene-text OCR system to spot text in the wild according to a query regular expression.

Optical Character Recognition (OCR) Scene Text Detection +2

Paper
Code

Understanding Video Scenes through Text: Insights from Text-based Video Question Answering

no code implementations • 4 Sep 2023 • Soumya Jahagirdar, Minesh Mathew, Dimosthenis Karatzas, C. V. Jawahar

Researchers have extensively studied the field of vision and language, discovering that both visual and textual content is crucial for understanding scenes effectively.

Domain Adaptation Question Answering +1

Paper
Add Code

Reading Between the Lanes: Text VideoQA on the Road

no code implementations • 8 Jul 2023 • George Tom, Minesh Mathew, Sergi Garcia, Dimosthenis Karatzas, C. V. Jawahar

Text and signs around roads provide crucial information for drivers, vital for safe navigation and situational awareness.

Question Answering Scene Text Recognition +1

Paper
Add Code

ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images

no code implementations • 5 Jun 2023 • Wenwen Yu, Chengquan Zhang, Haoyu Cao, Wei Hua, Bohan Li, Huang Chen, MingYu Liu, Mingrui Chen, Jianfeng Kuang, Mengjun Cheng, Yuning Du, Shikun Feng, Xiaoguang Hu, Pengyuan Lyu, Kun Yao, Yuechen Yu, Yuliang Liu, Wanxiang Che, Errui Ding, Cheng-Lin Liu, Jiebo Luo, Shuicheng Yan, Min Zhang, Dimosthenis Karatzas, Xing Sun, Jingdong Wang, Xiang Bai

It is hoped that this competition will attract many researchers in the field of CV and NLP, and bring some new thoughts to the field of Document AI.

Document AI Entity Linking +1

Paper
Add Code

ICDAR 2023 Competition on Reading the Seal Title

no code implementations • 24 Apr 2023 • Wenwen Yu, MingYu Liu, Mingrui Chen, Ning Lu, Yinlong Wen, Yuliang Liu, Dimosthenis Karatzas, Xiang Bai

To promote research in this area, we organized ICDAR 2023 competition on reading the seal title (ReST), which included two tasks: seal title text detection (Task 1) and end-to-end seal title recognition (Task 2).

Optical Character Recognition (OCR) Task 2 +1

Paper
Add Code

ICDAR 2023 Video Text Reading Competition for Dense and Small Text

no code implementations • 10 Apr 2023 • Weijia Wu, Yuzhong Zhao, Zhuang Li, Jiahong Li, Mike Zheng Shou, Umapada Pal, Dimosthenis Karatzas, Xiang Bai

In this competition report, we establish a video text reading benchmark, DSText, which focuses on dense and small text reading challenges in the video with various scenarios.

Task 2 Text Detection +2

Paper
Add Code

DocILE Benchmark for Document Information Localization and Extraction

1 code implementation • 11 Feb 2023 • Štěpán Šimsa, Milan Šulc, Michal Uřičář, Yash Patel, Ahmed Hamdi, Matěj Kocián, Matyáš Skalický, Jiří Matas, Antoine Doucet, Mickaël Coustaty, Dimosthenis Karatzas

This paper introduces the DocILE benchmark with the largest dataset of business documents for the tasks of Key Information Localization and Extraction and Line Item Recognition.

Key Information Extraction Unsupervised Pre-training

106

Paper
Code

Hierarchical multimodal transformers for Multi-Page DocVQA

1 code implementation • 7 Dec 2022 • Rubèn Tito, Dimosthenis Karatzas, Ernest Valveny

The proposed method is based on a hierarchical transformer architecture where the encoder summarizes the most relevant information of every page and then, the decoder takes this summarized information to generate the final answer.

Decoder Question Answering +1

Paper
Code

Watching the News: Towards VideoQA Models that can Read

no code implementations • 10 Nov 2022 • Soumya Jahagirdar, Minesh Mathew, Dimosthenis Karatzas, C. V. Jawahar

We demonstrate the limitations of current Scene Text VQA and VideoQA methods and propose ways to incorporate scene text information into VideoQA methods.

Question Answering Video Question Answering +1

Paper
Add Code

Show, Interpret and Tell: Entity-aware Contextualised Image Captioning in Wikipedia

no code implementations • 21 Sep 2022 • Khanh Nguyen, Ali Furkan Biten, Andres Mafla, Lluis Gomez, Dimosthenis Karatzas

Particularly, a similar Wikimedia image can be used to illustrate different articles, and the produced caption needs to be adapted to a specific context, therefore allowing us to explore the limits of a model to adjust captions to different contextual information.

Image Captioning

Paper
Add Code

MUST-VQA: MUltilingual Scene-text VQA

no code implementations • 14 Sep 2022 • Emanuele Vivoli, Ali Furkan Biten, Andres Mafla, Dimosthenis Karatzas, Lluis Gomez

In this paper, we present a framework for Multilingual Scene Text Visual Question Answering that deals with new languages in a zero-shot fashion.

Question Answering Visual Question Answering

Paper
Add Code

Out-of-Vocabulary Challenge Report

no code implementations • 14 Sep 2022 • Sergi Garcia-Bordils, Andrés Mafla, Ali Furkan Biten, Oren Nuriel, Aviad Aberdam, Shai Mazor, Ron Litman, Dimosthenis Karatzas

This paper presents final results of the Out-Of-Vocabulary 2022 (OOV) challenge.

Optical Character Recognition Optical Character Recognition (OCR) +1

Paper
Add Code

Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement

1 code implementation • 9 Mar 2022 • Mohamed Ali Souibgui, Sanket Biswas, Andres Mafla, Ali Furkan Biten, Alicia Fornés, Yousri Kessentini, Josep Lladós, Lluis Gomez, Dimosthenis Karatzas

In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement.

Document Enhancement Scene Text Recognition

Paper
Code

OCR-IDL: OCR Annotations for Industry Document Library Dataset

1 code implementation • 25 Feb 2022 • Ali Furkan Biten, Rubèn Tito, Lluis Gomez, Ernest Valveny, Dimosthenis Karatzas

It is our hope that OCR-IDL can be a starting point for future works on Document Intelligence.

Optical Character Recognition (OCR)

Paper
Code

ICDAR 2021 Competition on Document VisualQuestion Answering

no code implementations • 10 Nov 2021 • Rubèn Tito, Minesh Mathew, C. V. Jawahar, Ernest Valveny, Dimosthenis Karatzas

In this report we present results of the ICDAR 2021 edition of the Document Visual Question Challenges.

Visual Question Answering (VQA)

Paper
Add Code

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

no code implementations • 6 Oct 2021 • Ali Furkan Biten, Andres Mafla, Lluis Gomez, Dimosthenis Karatzas

In this work, we propose two metrics that evaluate the degree of semantic relevance of retrieved items, independently of their annotated binary relevance.

Image Captioning Image-text matching +2

Paper
Add Code

Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning

1 code implementation • 4 Oct 2021 • Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas

Explaining an image with missing or non-existent objects is known as object bias (hallucination) in image captioning.

Hallucination Image Captioning +1

Paper
Code

Asking questions on handwritten document collections

no code implementations • 2 Oct 2021 • Minesh Mathew, Lluis Gomez, Dimosthenis Karatzas, CV Jawahar

This work addresses the problem of Question Answering (QA) on handwritten document collections.

Optical Character Recognition (OCR) Question Answering +2

Paper
Add Code

One-shot Compositional Data Generation for Low Resource Handwritten Text Recognition

no code implementations • 11 May 2021 • Mohamed Ali Souibgui, Ali Furkan Biten, Sounak Dey, Alicia Fornés, Yousri Kessentini, Lluis Gomez, Dimosthenis Karatzas, Josep Lladós

Low resource Handwritten Text Recognition (HTR) is a hard problem due to the scarce annotated data and the very limited linguistic information (dictionaries and language models).

Handwritten Text Recognition HTR

Paper
Add Code

Document Collection Visual Question Answering

no code implementations • 27 Apr 2021 • Rubèn Tito, Dimosthenis Karatzas, Ernest Valveny

Current tasks and methods in Document Understanding aims to process documents as single elements.

document understanding Question Answering +1

Paper
Add Code

InfographicVQA

no code implementations • 26 Apr 2021 • Minesh Mathew, Viraj Bagal, Rubèn Pérez Tito, Dimosthenis Karatzas, Ernest Valveny, C. V Jawahar

Infographics are documents designed to effectively communicate information using a combination of textual, graphical and visual elements.

Question Answering Visual Question Answering

Paper
Add Code

ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction

1 code implementation • 18 Mar 2021 • Zheng Huang, Kai Chen, Jianhua He, Xiang Bai, Dimosthenis Karatzas, Shjian Lu, C. V. Jawahar

In this competition, we set up three tasks, namely, Scanned Receipt Text Localisation (Task 1), Scanned Receipt OCR (Task 2) and Key Information Extraction from Scanned Receipts (Task 3).

Key Information Extraction Optical Character Recognition (OCR) +1

38,845

Paper
Code

StacMR: Scene-Text Aware Cross-Modal Retrieval

1 code implementation • 8 Dec 2020 • Andrés Mafla, Rafael Sampaio de Rezende, Lluís Gómez, Diane Larlus, Dimosthenis Karatzas

Then, armed with this dataset, we describe several approaches which leverage scene text, including a better scene-text aware cross-modal retrieval method which uses specialized representations for text from the captions and text from the visual scene, and reconcile them in a common embedding space.

Cross-Modal Retrieval Information Retrieval +1

Paper
Code

Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image Classification and Retrieval

1 code implementation • 21 Sep 2020 • Andres Mafla, Sounak Dey, Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas

Scene text instances found in natural images carry explicit semantic information that can provide important cues to solve a wide array of computer vision problems.

Fine-Grained Image Classification General Classification +2

Paper
Code

Document Visual Question Answering Challenge 2020

no code implementations • 20 Aug 2020 • Minesh Mathew, Ruben Tito, Dimosthenis Karatzas, R. Manmatha, C. V. Jawahar

For the task 1 a new dataset is introduced comprising 50, 000 questions-answer(s) pairs defined over 12, 767 document images.

Question Answering Retrieval +2

Paper
Add Code

Retrieval Guided Unsupervised Multi-domain Image-to-Image Translation

1 code implementation • 11 Aug 2020 • Raul Gomez, Yahui Liu, Marco De Nadai, Dimosthenis Karatzas, Bruno Lepri, Nicu Sebe

In this paper we propose the use of an image retrieval system to assist the image-to-image translation task.

Image Retrieval Image-to-Image Translation +2

Paper
Code

Location Sensitive Image Retrieval and Tagging

no code implementations • ECCV 2020 • Raul Gomez, Jaume Gibert, Lluis Gomez, Dimosthenis Karatzas

People from different parts of the globe describe objects and concepts in distinct manners.

Image Retrieval Retrieval +1

Paper
Add Code

Text Recognition -- Real World Data and Where to Find Them

no code implementations • 6 Jul 2020 • Klára Janoušková, Jiri Matas, Lluis Gomez, Dimosthenis Karatzas

We present a method for exploiting weakly annotated images to improve text extraction pipelines.

Paper
Add Code

DocVQA: A Dataset for VQA on Document Images

3 code implementations • 1 Jul 2020 • Minesh Mathew, Dimosthenis Karatzas, C. V. Jawahar

The dataset consists of 50, 000 questions defined on 12, 000+ document images.

Ranked #1 on Visual Question Answering (VQA) on DocVQA val

Question Answering Reading Comprehension +1

105

Paper
Code

Multimodal grid features and cell pointers for Scene Text Visual Question Answering

no code implementations • 1 Jun 2020 • Lluís Gómez, Ali Furkan Biten, Rubèn Tito, Andrés Mafla, Marçal Rusiñol, Ernest Valveny, Dimosthenis Karatzas

This paper presents a new model for the task of scene text visual question answering, in which questions about a given image can only be answered by reading and understanding scene text that is present in it.

Question Answering Visual Question Answering

Paper
Add Code

Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features

2 code implementations • 14 Jan 2020 • Andres Mafla, Sounak Dey, Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas

Text contained in an image carries high-level semantics that can be exploited to achieve richer image understanding.

Ranked #1 on Fine-Grained Image Classification on Con-Text

Classification Fine-Grained Image Classification +5

Paper
Code

ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard

no code implementations • 20 Dec 2019 • Xi Liu, Rui Zhang, Yongsheng Zhou, Qianyi Jiang, Qi Song, Nan Li, Kai Zhou, Lei Wang, Dong Wang, Minghui Liao, Mingkun Yang, Xiang Bai, Baoguang Shi, Dimosthenis Karatzas, Shijian Lu, C. V. Jawahar

21 teams submit results for Task 1, 23 teams submit results for Task 2, 24 teams submit results for Task 3, and 13 teams submit results for Task 4.

Line Detection Task 2

Paper
Add Code

Exploring Hate Speech Detection in Multimodal Publications

1 code implementation • 9 Oct 2019 • Raul Gomez, Jaume Gibert, Lluis Gomez, Dimosthenis Karatzas

In this work we target the problem of hate speech detection in multimodal publications formed by a text and an image.

Hate Speech Detection

Paper
Code

ICDAR 2019 Competition on Large-scale Street View Text with Partial Labeling -- RRC-LSVT

no code implementations • 17 Sep 2019 • Yipeng Sun, Zihan Ni, Chee-Kheng Chng, Yuliang Liu, Canjie Luo, Chun Chet Ng, Junyu Han, Errui Ding, Jingtuo Liu, Dimosthenis Karatzas, Chee Seng Chan, Lianwen Jin

Robust text reading from street view images provides valuable information for various applications.

Text Detection Text Spotting +1

Paper
Add Code

ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT)

1 code implementation • 16 Sep 2019 • Chee-Kheng Chng, Yuliang Liu, Yipeng Sun, Chun Chet Ng, Canjie Luo, Zihan Ni, ChuanMing Fang, Shuaitao Zhang, Junyu Han, Errui Ding, Jingtuo Liu, Dimosthenis Karatzas, Chee Seng Chan, Lianwen Jin

This paper reports the ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT) that consists of three major challenges: i) scene text detection, ii) scene text recognition, and iii) scene text spotting.

Scene Text Detection Scene Text Recognition +2

726

Paper
Code

ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition -- RRC-MLT-2019

no code implementations • 1 Jul 2019 • Nibal Nayef, Yash Patel, Michal Busta, Pinaki Nath Chowdhury, Dimosthenis Karatzas, Wafa Khlif, Jiri Matas, Umapada Pal, Jean-Christophe Burie, Cheng-Lin Liu, Jean-Marc Ogier

With the growing cosmopolitan culture of modern cities, the need of robust Multi-Lingual scene Text (MLT) detection and recognition systems has never been more immense.

Cultural Vocal Bursts Intensity Prediction General Classification +2

Paper
Add Code

ICDAR 2019 Competition on Scene Text Visual Question Answering

no code implementations • 30 Jun 2019 • Ali Furkan Biten, Rubèn Tito, Andres Mafla, Lluis Gomez, Marçal Rusiñol, Minesh Mathew, C. V. Jawahar, Ernest Valveny, Dimosthenis Karatzas

ST-VQA introduces an important aspect that is not addressed by any Visual Question Answering system up to date, namely the incorporation of scene text to answer questions asked about an image.

Question Answering Visual Question Answering

Paper
Add Code

Selective Style Transfer for Text

1 code implementation • 4 Jun 2019 • Raul Gomez, Ali Furkan Biten, Lluis Gomez, Jaume Gibert, Marçal Rusiñol, Dimosthenis Karatzas

This paper explores the possibilities of image style transfer applied to text maintaining the original transcriptions.

Data Augmentation Scene Text Detection +2

Paper
Code

Scene Text Visual Question Answering

3 code implementations • ICCV 2019 • Ali Furkan Biten, Ruben Tito, Andres Mafla, Lluis Gomez, Marçal Rusiñol, Ernest Valveny, C. V. Jawahar, Dimosthenis Karatzas

Current visual question answering datasets do not consider the rich semantic information conveyed by text within an image.

Question Answering Visual Question Answering

Paper
Code

Good News, Everyone! Context driven entity-aware captioning for news images

1 code implementation • CVPR 2019 • Ali Furkan Biten, Lluis Gomez, Marçal Rusiñol, Dimosthenis Karatzas

We propose a novel captioning method that is able to leverage contextual information provided by the text of news articles associated with an image.

Descriptive Image Captioning

125

Paper
Code

Self-Supervised Visual Representations for Cross-Modal Retrieval

no code implementations • 31 Jan 2019 • Yash Patel, Lluis Gomez, Marçal Rusiñol, Dimosthenis Karatzas, C. V. Jawahar

Cross-modal retrieval methods have been significantly improved in last years with the use of deep neural networks and large-scale annotated datasets such as ImageNet and Places.

Cross-Modal Retrieval Image Classification +3

Paper
Add Code

Self-Supervised Learning from Web Data for Multimodal Retrieval

1 code implementation • 7 Jan 2019 • Raul Gomez, Lluis Gomez, Jaume Gibert, Dimosthenis Karatzas

In this work we propose to exploit this free available data to learn a multimodal image and text embedding, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval.

Image Retrieval Retrieval +1

Paper
Code

Soft-PHOC Descriptor for End-to-End Word Spotting in Egocentric Scene Images

1 code implementation • 4 Sep 2018 • Dena Bazazian, Dimosthenis Karatzas, Andrew D. Bagdanov

In this paper we propose a technique to create and exploit an intermediate representation of images based on text attributes which are character probability maps.

Attribute Dynamic Time Warping +1

Paper
Code

Single Shot Scene Text Retrieval

3 code implementations • ECCV 2018 • Lluís Gómez, Andrés Mafla, Marçal Rusiñol, Dimosthenis Karatzas

In this way, the text based image retrieval task can be casted as a simple nearest neighbor search of the query text representation over the outputs of the CNN over the entire image database.

Image Retrieval Retrieval +2

Paper
Code

Learning from #Barcelona Instagram data what Locals and Tourists post about its Neighbourhoods

1 code implementation • 20 Aug 2018 • Raul Gomez, Lluis Gomez, Jaume Gibert, Dimosthenis Karatzas

We perform a language separate treatment of the data and show that it can be extrapolated to a tourists and locals separate analysis, and that tourism is reflected in Social Media at a neighborhood level.

Paper
Code

Learning to Learn from Web Data through Deep Semantic Embeddings

1 code implementation • 20 Aug 2018 • Raul Gomez, Lluis Gomez, Jaume Gibert, Dimosthenis Karatzas

In this paper we propose to learn a multimodal image and text embedding from Web and Social Media data, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval.

Image Retrieval Retrieval

Paper
Code

TextTopicNet - Self-Supervised Learning of Visual Features Through Embedding Images on Semantic Text Spaces

1 code implementation • 4 Jul 2018 • Yash Patel, Lluis Gomez, Raul Gomez, Marçal Rusiñol, Dimosthenis Karatzas, C. V. Jawahar

We show that adequate visual features can be learned efficiently by training a CNN to predict the semantic textual context in which a particular image is more probable to appear as an illustration.

Image Classification object-detection +3

Paper
Code

Non-deterministic Behavior of Ranking-based Metrics when Evaluating Embeddings

no code implementations • 19 Jun 2018 • Anguelos Nicolaou, Sounak Dey, Vincent Christlein, Andreas Maier, Dimosthenis Karatzas

Embedding data into vector spaces is a very popular strategy of pattern recognition methods.

Computer Security

Paper
Add Code

The Robust Reading Competition Annotation and Evaluation Platform

no code implementations • 18 Oct 2017 • Dimosthenis Karatzas, Lluis Gómez, Anguelos Nicolaou, Marçal Rusiñol

The ICDAR Robust Reading Competition (RRC), initiated in 2003 and re-established in 2011, has become a de-facto evaluation standard for robust reading systems and algorithms.

Management

Paper
Add Code

Self-supervised learning of visual features through embedding images into text topic spaces

no code implementations • CVPR 2017 • Lluis Gomez, Yash Patel, Marçal Rusiñol, Dimosthenis Karatzas, C. V. Jawahar

End-to-end training from scratch of current deep architectures for new computer vision problems would require Imagenet-scale datasets, and this is not always possible.

Image Classification object-detection +3

Paper
Add Code

Improving Text Proposals for Scene Images with Fully Convolutional Networks

1 code implementation • 16 Feb 2017 • Dena Bazazian, Raul Gomez, Anguelos Nicolaou, Lluis Gomez, Dimosthenis Karatzas, Andrew D. Bagdanov

Text Proposals have emerged as a class-dependent version of object proposals - efficient approaches to reduce the search space of possible text object locations in an image.

Object Scene Text Recognition

Paper
Code

TextProposals: a Text-specific Selective Search Algorithm for Word Spotting in the Wild

1 code implementation • 10 Apr 2016 • Lluis Gomez-Bigorda, Dimosthenis Karatzas

Motivated by the success of powerful while expensive techniques to recognize words in a holistic way, object proposals techniques emerge as an alternative to the traditional text detectors.

Object

191

Paper
Code

Improving patch-based scene text script identification with ensembles of conjoined networks

1 code implementation • 24 Feb 2016 • Lluis Gomez, Anguelos Nicolaou, Dimosthenis Karatzas

Instead of resizing input images to a fixed aspect ratio as in the typical use of holistic CNN classifiers, we propose here a patch-based classification framework in order to preserve discriminative parts of the image that are characteristic of its class.

General Classification Optical Character Recognition (OCR)

Paper
Code

A fine-grained approach to scene text script identification

no code implementations • 24 Feb 2016 • Lluis Gomez, Dimosthenis Karatzas

Although widely studied for document images and handwritten documents, it remains an almost unexplored territory for scene text images.

Scene Text Recognition Text Detection

Paper
Add Code

Visual Script and Language Identification

no code implementations • 8 Jan 2016 • Anguelos Nicolaou, Andrew Bagdanov, Lluis Gomez-Bigorda, Dimosthenis Karatzas

In this paper we introduce a script identification method based on hand-crafted texture features and an artificial neural network.

Language Identification

Paper
Add Code

Object Proposals for Text Extraction in the Wild

1 code implementation • 8 Sep 2015 • Lluis Gomez, Dimosthenis Karatzas

The use of Object Proposals techniques in the scene text understanding field is innovative.

Object

191

Paper
Code

Sparse Radial Sampling LBP for Writer Identification

no code implementations • 23 Apr 2015 • Anguelos Nicolaou, Andrew D. Bagdanov, Marcus Liwicki, Dimosthenis Karatzas

In this paper we present the use of Sparse Radial Sampling Local Binary Patterns, a variant of Local Binary Patterns (LBP) for text-as-texture classification.

Binarization General Classification +1

Paper
Add Code

A Fast Hierarchical Method for Multi-script and Arbitrary Oriented Scene Text Extraction

no code implementations • 28 Jul 2014 • Lluis Gomez, Dimosthenis Karatzas

Typography and layout lead to the hierarchical organisation of text in words, text lines, paragraphs.

Clustering Text Detection +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.