Optical Character Recognition (OCR)

311 papers with code • 5 benchmarks • 42 datasets

Optical Character Recognition or Optical Character Reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo, license plates in cars...) or from subtitle text superimposed on an image (for example: from a television broadcast)

Libraries

Use these libraries to find Optical Character Recognition (OCR) models and implementations

ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images

minhquan6203/vitextvqa-dataset 16 Apr 2024

Visual Question Answering (VQA) is a complicated task that requires the capability of simultaneously processing natural language and images.

5
16 Apr 2024

NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document Enhancement

faceonlive/ai-research 8 Apr 2024

Real-world documents may suffer various forms of degradation, often resulting in lower accuracy in optical character recognition (OCR) systems.

131
08 Apr 2024

CMULAB: An Open-Source Framework for Training and Deployment of Natural Language Processing Models

neulab/cmulab 3 Apr 2024

Effectively using Natural Language Processing (NLP) tools in under-resourced languages requires a thorough understanding of the language itself, familiarity with the latest models and training methodologies, and technical expertise to deploy these models.

8
03 Apr 2024

Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want

AFeng-x/Draw-and-Understand 29 Mar 2024

In this paper, we introduce the Draw-and-Understand project: a new model, a multi-domain dataset, and a challenging benchmark for visual prompting.

34
29 Mar 2024

ChroniclingAmericaQA: A Large-scale Question Answering Dataset based on Historical American Newspaper Pages

datascienceuibk/chroniclingamericaqa 26 Mar 2024

Therefore, to enable realistic testing of QA models, our dataset can be used in three different ways: answering questions from raw and noisy content, answering questions from cleaner, corrected version of the content, as well as answering questions from scanned images of newspaper pages.

4
26 Mar 2024

Visually Guided Generative Text-Layout Pre-training for Document Intelligence

veason-silverbullet/vitlp 25 Mar 2024

Prior study shows that pre-training techniques can boost the performance of visual document understanding (VDU), which typically requires models to gain abilities to perceive and reason both document texts and layouts (e. g., locations of texts and table-cells).

14
25 Mar 2024

PEaCE: A Chemistry-Oriented Dataset for Optical Character Recognition on Scientific Documents

zn1010/peace 23 Mar 2024

To mitigate this gap, we present the Printed English and Chemical Equations (PEaCE) dataset, containing both synthetic and real-world records, and evaluate the efficacy of transformer-based OCR models when trained on this resource.

4
23 Mar 2024

mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

x-plug/mplug-docowl 19 Mar 2024

In this work, we emphasize the importance of structure information in Visual Document Understanding and propose the Unified Structure Learning to boost the performance of MLLMs.

837
19 Mar 2024

Advancing Multilingual Handwritten Numeral Recognition with Attention-driven Transfer Learning

CVLab-SHUT/HandWrittenDigitRecognition IEEEXplore 2024

In this work, we present a robust and cost-effective approach that handles multilingual handwritten numeral recognition across a wide range of languages.

2
18 Mar 2024

Adversarial Training with OCR Modality Perturbation for Scene-Text Visual Question Answering

FrankZxShen/ATS 14 Mar 2024

Scene-Text Visual Question Answering (ST-VQA) aims to understand scene text in images and answer questions related to the text content.

2
14 Mar 2024