no code implementations • EMNLP (LaTeCHCLfL, CLFL, LaTeCH) 2021 • David Schmidt, Albin Zehe, Janne Lorenzen, Lisa Sergel, Sebastian Düker, Markus Krug, Frank Puppe
The release of this corpus provides an opportunity of training and comparing different algorithms for the extraction of character networks, which so far was barely possible due to heterogeneous interests of previous researchers.
no code implementations • EACL 2021 • Albin Zehe, Leonard Konle, Lea Katharina D{\"u}mpelmann, Evelyn Gius, Andreas Hotho, Fotis Jannidis, Lucas Kaufmann, Markus Krug, Frank Puppe, Nils Reiter, Annekea Schreiber, Nathalie Wiedmer
This paper introduces the novel task of scene segmentation on narrative texts and provides an annotated corpus, a discussion of the linguistic and narrative properties of the task and baseline experiments towards automatic solutions.
no code implementations • 9 Sep 2019 • Christian Reul, Dennis Christ, Alexander Hartelt, Nico Balbach, Maximilian Wehner, Uwe Springmann, Christoph Wick, Christine Grundig, Andreas Büttner, Frank Puppe
Nevertheless, in the last few years great progress has been made in the area of historical OCR, resulting in several powerful open-source tools for preprocessing, layout recognition and segmentation, character recognition and post-processing.
Optical Character Recognition Optical Character Recognition (OCR)
1 code implementation • 8 Oct 2018 • Christian Reul, Uwe Springmann, Christoph Wick, Frank Puppe
In this paper we evaluate Optical Character Recognition (OCR) of 19th century Fraktur scripts without book-specific training using mixed models, i. e. models trained to recognize a variety of fonts and typesets from previously unseen sources.
Optical Character Recognition Optical Character Recognition (OCR)
1 code implementation • 5 Jul 2018 • Christoph Wick, Christian Reul, Frank Puppe
Optical Character Recognition (OCR) on contemporary and historical data is still in the focus of many researchers.
Optical Character Recognition Optical Character Recognition (OCR)
1 code implementation • 27 Feb 2018 • Christoph Wick, Christian Reul, Frank Puppe
This paper proposes a combination of a convolutional and a LSTM network to improve the accuracy of OCR on early printed books.
1 code implementation • 27 Feb 2018 • Christian Reul, Uwe Springmann, Christoph Wick, Frank Puppe
We combine three methods which significantly improve the OCR accuracy of OCR models trained on early printed books: (1) The pretraining method utilizes the information stored in already existing models trained on a variety of typesets (mixed models) instead of starting the training from scratch.
1 code implementation • 15 Dec 2017 • Christian Reul, Christoph Wick, Uwe Springmann, Frank Puppe
The evaluation on seven early printed books showed that training from the Latin mixed model reduces the average amount of errors by 43% and 26%, respectively compared to training from scratch with 60 and 150 lines of ground truth, respectively.
no code implementations • 4 Dec 2017 • Christoph Wick, Frank Puppe
Convolutional neural networks (CNNs) have become popular especially in computer vision in the last few years because they achieved outstanding performance on different tasks, such as image classifications.
1 code implementation • 27 Nov 2017 • Christian Reul, Uwe Springmann, Christoph Wick, Frank Puppe
Experiments on seven early printed books show that the proposed method outperforms the standard approach considerably by reducing the amount of errors by up to 50% and more.
no code implementations • 21 Nov 2017 • Christoph Wick, Frank Puppe
For evaluation of this model we introduce a novel metric that is independent of ambiguous ground truth called Foreground Pixel Accuracy (FgPA).
2 code implementations • 20 Jan 2017 • Christian Reul, Uwe Springmann, Frank Puppe
A semi-automatic open-source tool for layout analysis on early printed books is presented.