Optical Character Recognition (OCR)

311 papers with code • 5 benchmarks • 42 datasets

Optical Character Recognition or Optical Character Reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo, license plates in cars...) or from subtitle text superimposed on an image (for example: from a television broadcast)

Benchmarks

Add a Result

These leaderboards are used to track progress in Optical Character Recognition (OCR)

Dataset	Best Model	Compare
Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study	DTrOCR	See all
FSNS - Test	AttentionOCR_Inception-resnet-v2_Location	See all
I2L-140K	I2L-NOPOOL	See all
SUT	Tesseract	See all
im2latex-100k	I2L-STRIPS	See all

Libraries

Use these libraries to find Optical Character Recognition (OCR) models and implementations

PaddlePaddle/PaddleOCR

18 papers

38,330

huggingface/transformers

6 papers

124,593

open-mmlab/mmocr

6 papers

4,059

alibabaresearch/advancedliteratemac…

5 papers

894

See all 10 libraries.

Datasets

Subtasks

Irregular Text Recognition

Handwritten Chinese Text Recognition

Offline Handwritten Chinese Character Recognition

Word Spotting In Handwritten Documents

Handwritten Digit Image Synthesis

Grapheme Detection

Latest papers

Most implemented Social Latest No code

ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images

minhquan6203/vitextvqa-dataset • 16 Apr 2024

Visual Question Answering (VQA) is a complicated task that requires the capability of simultaneously processing natural language and images.

16 Apr 2024

Paper
Code

NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document Enhancement

faceonlive/ai-research • 8 Apr 2024

Real-world documents may suffer various forms of degradation, often resulting in lower accuracy in optical character recognition (OCR) systems.

131

08 Apr 2024

Paper
Code

CMULAB: An Open-Source Framework for Training and Deployment of Natural Language Processing Models

neulab/cmulab • 3 Apr 2024

Effectively using Natural Language Processing (NLP) tools in under-resourced languages requires a thorough understanding of the language itself, familiarity with the latest models and training methodologies, and technical expertise to deploy these models.

03 Apr 2024

Paper
Code

Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want

AFeng-x/Draw-and-Understand • • 29 Mar 2024

In this paper, we introduce the Draw-and-Understand project: a new model, a multi-domain dataset, and a challenging benchmark for visual prompting.

29 Mar 2024

Paper
Code

ChroniclingAmericaQA: A Large-scale Question Answering Dataset based on Historical American Newspaper Pages

datascienceuibk/chroniclingamericaqa • 26 Mar 2024

Therefore, to enable realistic testing of QA models, our dataset can be used in three different ways: answering questions from raw and noisy content, answering questions from cleaner, corrected version of the content, as well as answering questions from scanned images of newspaper pages.

26 Mar 2024

Paper
Code

Visually Guided Generative Text-Layout Pre-training for Document Intelligence

veason-silverbullet/vitlp • 25 Mar 2024

Prior study shows that pre-training techniques can boost the performance of visual document understanding (VDU), which typically requires models to gain abilities to perceive and reason both document texts and layouts (e. g., locations of texts and table-cells).

25 Mar 2024

Paper
Code

PEaCE: A Chemistry-Oriented Dataset for Optical Character Recognition on Scientific Documents

zn1010/peace • 23 Mar 2024

To mitigate this gap, we present the Printed English and Chemical Equations (PEaCE) dataset, containing both synthetic and real-world records, and evaluate the efficacy of transformer-based OCR models when trained on this resource.

23 Mar 2024

Paper
Code

mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

x-plug/mplug-docowl • • 19 Mar 2024

In this work, we emphasize the importance of structure information in Visual Document Understanding and propose the Unified Structure Learning to boost the performance of MLLMs.

837

19 Mar 2024

Paper
Code

Advancing Multilingual Handwritten Numeral Recognition with Attention-driven Transfer Learning

CVLab-SHUT/HandWrittenDigitRecognition • • IEEEXplore 2024

In this work, we present a robust and cost-effective approach that handles multilingual handwritten numeral recognition across a wide range of languages.

18 Mar 2024

Paper
Code

Adversarial Training with OCR Modality Perturbation for Scene-Text Visual Question Answering

FrankZxShen/ATS • 14 Mar 2024

Scene-Text Visual Question Answering (ST-VQA) aims to understand scene text in images and answer questions related to the text content.

14 Mar 2024

Paper
Code

Optical Character Recognition (OCR)

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result