Optical Character Recognition (OCR)

313 papers with code • 5 benchmarks • 42 datasets

Optical Character Recognition or Optical Character Reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo, license plates in cars...) or from subtitle text superimposed on an image (for example: from a television broadcast)

Benchmarks

Add a Result

These leaderboards are used to track progress in Optical Character Recognition (OCR)

Dataset	Best Model	Compare
Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study	DTrOCR	See all
FSNS - Test	AttentionOCR_Inception-resnet-v2_Location	See all
I2L-140K	I2L-NOPOOL	See all
SUT	Tesseract	See all
im2latex-100k	I2L-STRIPS	See all

Libraries

Use these libraries to find Optical Character Recognition (OCR) models and implementations

PaddlePaddle/PaddleOCR

18 papers

38,632

open-mmlab/mmocr

6 papers

4,086

alibabaresearch/advancedliteratemac…

5 papers

950

Media-Smart/vedastr

5 papers

531

See all 10 libraries.

Datasets

Subtasks

Irregular Text Recognition

Handwritten Chinese Text Recognition

Offline Handwritten Chinese Character Recognition

Word Spotting In Handwritten Documents

Handwritten Digit Image Synthesis

Grapheme Detection

Latest papers with no code

Most implemented Social Latest No code

Optical Text Recognition in Nepali and Bengali: A Transformer-based Approach

no code yet • 3 Apr 2024

Efforts on the research and development of OCR systems for Low-Resource Languages are relatively new.

Paper
Add Code

RealKIE: Five Novel Datasets for Enterprise Key Information Extraction

no code yet • 29 Mar 2024

We introduce RealKIE, a benchmark of five challenging datasets aimed at advancing key information extraction methods, with an emphasis on enterprise applications.

Paper
Add Code

SciCapenter: Supporting Caption Composition for Scientific Figures with Machine-Generated Captions and Ratings

no code yet • 26 Mar 2024

Crafting effective captions for figures is important.

Paper
Add Code

The Solution for the ICCV 2023 1st Scientific Figure Captioning Challenge

no code yet • 26 Mar 2024

In this paper, we propose a solution for improving the quality of captions generated for figures in papers.

Paper
Add Code

Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation

no code yet • 25 Mar 2024

We introduce a benchmark, LenCom-Eval, specifically designed for testing models' capability in generating images with Lengthy and Complex visual text.

Paper
Add Code

Grammatical vs Spelling Error Correction: An Investigation into the Responsiveness of Transformer-based Language Models using BART and MarianMT

no code yet • 25 Mar 2024

Text continues to remain a relevant form of representation for information.

Paper
Add Code

Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

no code yet • 19 Mar 2024

We propose a technique to transfer capabilities from LLMs to VLMs.

Paper
Add Code

OCR is All you need: Importing Multi-Modality into Image-based Defect Detection System

no code yet • 18 Mar 2024

To address this, we introduce an external modality-guided data mining framework, primarily rooted in optical character recognition (OCR), to extract statistical features from images as a second modality to enhance performance, termed OANet (Ocr-Aoi-Net).

Paper
Add Code

Advanced Knowledge Extraction of Physical Design Drawings, Translation and conversion to CAD formats using Deep Learning

no code yet • 17 Mar 2024

The approach employs object detection model, such as Yolov7, Faster R-CNN, to detect physical drawing objects present in the images followed by, edge detection algorithms such as canny filter to extract and refine the identified lines from the drawing region and curve detection techniques to detect circle.

Paper
Add Code

TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model

no code yet • 15 Mar 2024

Taking advantage of the fine-tuned language model on scene recognition benchmarks and the paradigm of text block detection, extensive experiments demonstrate the superior performance of our scene text spotter across multiple public benchmarks.

Paper
Add Code

Optical Character Recognition (OCR)

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result