Zero-Shot Learning Based Approach For Medieval Word Recognition Using Deep-Learned Features

Historical manuscripts reflect our past. Recently digitization of large quantities of historical handwritten docu- ments is taking place in every corner of the world, and are being archived. From those digital repositories, automatic text indexing and retrieval system fetch only those documents to an end user that they are interested in. A regular OCR technology is not capable of rendering this service to an end user in a reliable manner. Instead, a word recognition/spotting algorithm performs the task. Word recognition based systems require enough labelled data per class to train the system. Moreover, all word classes need to be taught beforehand. Though word spotting could evade this drawback of prior training, these systems often need to have additional overheads like a language model to deal with “out of lexicon” words. Zero-shot learning could be a possible alternative to counter such situation. A Zero-shot learning algorithm is capable of handling unseen classes, provided the algorithm has been fortified with rich discriminating features and reliable “attribute description” per class during training. Since deeply learned features have enough discriminating power, a deep learning framework has been used here for feature extraction purpose. To the best of our knowledge, this is probably the first work on “out of lexicon” medieval word recognition using a Zero-Shot Learning framework. We obtained very encouraging results(accuracy ≈57% for “out of lexicon” classes) while dealing with 166 training classes and 50 unseen test classes.

PDF

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here