Optical Character Recognition (OCR)

311 papers with code • 5 benchmarks • 42 datasets

Optical Character Recognition or Optical Character Reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo, license plates in cars...) or from subtitle text superimposed on an image (for example: from a television broadcast)

Benchmarks

Add a Result

These leaderboards are used to track progress in Optical Character Recognition (OCR)

Dataset	Best Model	Compare
Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study	DTrOCR	See all
FSNS - Test	AttentionOCR_Inception-resnet-v2_Location	See all
I2L-140K	I2L-NOPOOL	See all
SUT	Tesseract	See all
im2latex-100k	I2L-STRIPS	See all

Libraries

Use these libraries to find Optical Character Recognition (OCR) models and implementations

PaddlePaddle/PaddleOCR

18 papers

38,291

huggingface/transformers

6 papers

124,527

open-mmlab/mmocr

6 papers

4,059

alibabaresearch/advancedliteratemac…

5 papers

894

See all 10 libraries.

Datasets

Subtasks

Irregular Text Recognition

Handwritten Chinese Text Recognition

Offline Handwritten Chinese Character Recognition

Word Spotting In Handwritten Documents

Handwritten Digit Image Synthesis

Grapheme Detection

Latest papers with no code

Most implemented Social Latest No code

TC-OCR: TableCraft OCR for Efficient Detection & Recognition of Table Structure & Content

no code yet • 16 Apr 2024

Our proposed approach achieves an IOU of 0. 96 and an OCR Accuracy of 78%, showcasing a remarkable improvement of approximately 25% in the OCR Accuracy compared to the previous Table Transformer approach.

Paper
Add Code

Resilience of Large Language Models for Noisy Instructions

no code yet • 15 Apr 2024

As the rapidly advancing domain of natural language processing (NLP), large language models (LLMs) have emerged as powerful tools for interpreting human commands and generating text across various tasks.

Paper
Add Code

Convolution-based Probability Gradient Loss for Semantic Segmentation

no code yet • 10 Apr 2024

In this paper, we introduce a novel Convolution-based Probability Gradient (CPG) loss for semantic segmentation.

Paper
Add Code

VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?

no code yet • 9 Apr 2024

Multimodal Large Language models (MLLMs) have shown promise in web-related tasks, but evaluating their performance in the web domain remains a challenge due to the lack of comprehensive benchmarks.

Paper
Add Code

Making Old Kurdish Publications Processable by Augmenting Available Optical Character Recognition Engines

no code yet • 9 Apr 2024

Having an extensive dataset is crucial to develop OCR systems with reasonable accuracy, as currently, no public datasets are available for historical Kurdish documents; this posed a significant challenge in our work.

Paper
Add Code

HAMMR: HierArchical MultiModal React agents for generic VQA

no code yet • 8 Apr 2024

We start from a multimodal ReAct-based system and make it hierarchical by enabling our HAMMR agents to call upon other specialized agents.

Paper
Add Code

Design and Development of a Framework For Stroke-Based Handwritten Gujarati Font Generation

no code yet • 4 Apr 2024

The generation phase involves the user providing a small subset of characters, and the system automatically generates the remaining character glyphs based on extracted strokes and learned rules, resulting in handwritten Gujarati fonts.

Paper
Add Code

Optical Text Recognition in Nepali and Bengali: A Transformer-based Approach

no code yet • 3 Apr 2024

Efforts on the research and development of OCR systems for Low-Resource Languages are relatively new.

Paper
Add Code

RealKIE: Five Novel Datasets for Enterprise Key Information Extraction

no code yet • 29 Mar 2024

We introduce RealKIE, a benchmark of five challenging datasets aimed at advancing key information extraction methods, with an emphasis on enterprise applications.

Paper
Add Code

SciCapenter: Supporting Caption Composition for Scientific Figures with Machine-Generated Captions and Ratings

no code yet • 26 Mar 2024

Crafting effective captions for figures is important.

Paper
Add Code

Optical Character Recognition (OCR)

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result