Optical Character Recognition (OCR)
311 papers with code • 5 benchmarks • 42 datasets
Optical Character Recognition or Optical Character Reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo, license plates in cars...) or from subtitle text superimposed on an image (for example: from a television broadcast)
Libraries
Use these libraries to find Optical Character Recognition (OCR) models and implementationsSubtasks
Most implemented papers
Chinese Text in the Wild
[python3. 6] 运用tf实现自然场景文字检测, keras/pytorch实现ctpn+crnn+ctc实现不定长场景文字OCR识别
Exploring Cross-Image Pixel Contrast for Semantic Segmentation
Inspired by the recent advance in unsupervised contrastive representation learning, we propose a pixel-wise contrastive framework for semantic segmentation in the fully supervised setting.
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images
The goal of COCO-Text is to advance state-of-the-art in text detection and recognition in natural images.
End-to-End Interpretation of the French Street Name Signs Dataset
We introduce the French Street Name Signs (FSNS) Dataset consisting of more than a million images of street name signs cropped from Google Street View images of France.
DE-GAN: A Conditional Generative Adversarial Network for Document Enhancement
Documents often exhibit various forms of degradation, which make it hard to be read and substantially deteriorate the performance of an OCR system.
OCR-free Document Understanding Transformer
Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs.
Attention-based Extraction of Structured Information from Street View Imagery
We present a neural network model - based on CNNs, RNNs and a novel attention mechanism - which achieves 84. 2% accuracy on the challenging French Street Name Signs (FSNS) dataset, significantly outperforming the previous state of the art (Smith'16), which achieved 72. 46%.
STN-OCR: A single Neural Network for Text Detection and Text Recognition
In contrast to most existing works that consist of multiple deep neural networks and several pre-processing steps we propose to use a single deep neural network that learns to detect and recognize text from natural images in a semi-supervised way.
E2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text
An end-to-end trainable (fully differentiable) method for multi-language scene text localization and recognition is proposed.
NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text Recognition
Considering scene image has large variation in text and background, we further design a modality-transform block to effectively transform 2D input images to 1D sequences, combined with the encoder to extract more discriminative features.