Optical Character Recognition (OCR)
311 papers with code • 5 benchmarks • 42 datasets
Optical Character Recognition or Optical Character Reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo, license plates in cars...) or from subtitle text superimposed on an image (for example: from a television broadcast)
Libraries
Use these libraries to find Optical Character Recognition (OCR) models and implementationsSubtasks
Latest papers
ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images
Visual Question Answering (VQA) is a complicated task that requires the capability of simultaneously processing natural language and images.
NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document Enhancement
Real-world documents may suffer various forms of degradation, often resulting in lower accuracy in optical character recognition (OCR) systems.
CMULAB: An Open-Source Framework for Training and Deployment of Natural Language Processing Models
Effectively using Natural Language Processing (NLP) tools in under-resourced languages requires a thorough understanding of the language itself, familiarity with the latest models and training methodologies, and technical expertise to deploy these models.
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
In this paper, we introduce the Draw-and-Understand project: a new model, a multi-domain dataset, and a challenging benchmark for visual prompting.
ChroniclingAmericaQA: A Large-scale Question Answering Dataset based on Historical American Newspaper Pages
Therefore, to enable realistic testing of QA models, our dataset can be used in three different ways: answering questions from raw and noisy content, answering questions from cleaner, corrected version of the content, as well as answering questions from scanned images of newspaper pages.
Visually Guided Generative Text-Layout Pre-training for Document Intelligence
Prior study shows that pre-training techniques can boost the performance of visual document understanding (VDU), which typically requires models to gain abilities to perceive and reason both document texts and layouts (e. g., locations of texts and table-cells).
PEaCE: A Chemistry-Oriented Dataset for Optical Character Recognition on Scientific Documents
To mitigate this gap, we present the Printed English and Chemical Equations (PEaCE) dataset, containing both synthetic and real-world records, and evaluate the efficacy of transformer-based OCR models when trained on this resource.
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
In this work, we emphasize the importance of structure information in Visual Document Understanding and propose the Unified Structure Learning to boost the performance of MLLMs.
Advancing Multilingual Handwritten Numeral Recognition with Attention-driven Transfer Learning
In this work, we present a robust and cost-effective approach that handles multilingual handwritten numeral recognition across a wide range of languages.
Adversarial Training with OCR Modality Perturbation for Scene-Text Visual Question Answering
Scene-Text Visual Question Answering (ST-VQA) aims to understand scene text in images and answer questions related to the text content.