Optical Character Recognition (OCR)

313 papers with code • 5 benchmarks • 42 datasets

Optical Character Recognition or Optical Character Reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo, license plates in cars...) or from subtitle text superimposed on an image (for example: from a television broadcast)

Benchmarks

Add a Result

These leaderboards are used to track progress in Optical Character Recognition (OCR)

Dataset	Best Model	Compare
Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study	DTrOCR	See all
FSNS - Test	AttentionOCR_Inception-resnet-v2_Location	See all
I2L-140K	I2L-NOPOOL	See all
SUT	Tesseract	See all
im2latex-100k	I2L-STRIPS	See all

Libraries

Use these libraries to find Optical Character Recognition (OCR) models and implementations

PaddlePaddle/PaddleOCR

18 papers

38,632

open-mmlab/mmocr

6 papers

4,086

alibabaresearch/advancedliteratemac…

5 papers

955

Media-Smart/vedastr

5 papers

531

See all 10 libraries.

Datasets

Subtasks

Irregular Text Recognition

Handwritten Chinese Text Recognition

Offline Handwritten Chinese Character Recognition

Word Spotting In Handwritten Documents

Handwritten Digit Image Synthesis

Grapheme Detection

Latest papers

Most implemented Social Latest No code

Adversarial Training with OCR Modality Perturbation for Scene-Text Visual Question Answering

FrankZxShen/ATS • 14 Mar 2024

Scene-Text Visual Question Answering (ST-VQA) aims to understand scene text in images and answer questions related to the text content.

14 Mar 2024

Paper
Code

DeepSeek-VL: Towards Real-World Vision-Language Understanding

deepseek-ai/deepseek-vl • • 8 Mar 2024

The DeepSeek-VL family (both 1. 3B and 7B models) showcases superior user experiences as a vision-language chatbot in real-world applications, achieving state-of-the-art or competitive performance across a wide range of visual-language benchmarks at the same model size while maintaining robust performance on language-centric benchmarks.

1,525

08 Mar 2024

Paper
Code

TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document

yuliang-liu/monkey • • 7 Mar 2024

We present TextMonkey, a large multimodal model (LMM) tailored for text-centric tasks.

1,397

07 Mar 2024

Paper
Code

Syntactic Language Change in English and German: Metrics, Parsers, and Convergences

cyr19/syntaxchange • 18 Feb 2024

Even though we have evidence that recent parsers trained on modern treebanks are not heavily affected by data 'noise' such as spelling changes and OCR errors in our historic data, we find that results of syntactic language change are sensitive to the parsers involved, which is a caution against using a single parser for evaluating syntactic language change as done in previous work.

18 Feb 2024

Paper
Code

TEXTRON: Weakly Supervised Multilingual Text Detection through Data Programming

IITB-LEAP-OCR/TEXTRON • IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024

In order to solve this problem, we propose TEXTRON, a Data Programming-based approach, where users can plug various text detection methods into a weak supervision-based learning framework.

15 Feb 2024

Paper
Code

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

alpha-vllm/llama2-accessory • • 8 Feb 2024

We propose SPHINX-X, an extensive Multimodality Large Language Model (MLLM) series developed upon SPHINX.

2,514

08 Feb 2024

Paper
Code

MouSi: Poly-Visual-Expert Vision-Language Models

fudannlplab/mousi • 30 Jan 2024

This technique introduces a fusion network to unify the processing of outputs from different visual experts, while bridging the gap between image encoders and pre-trained LLMs.

30 Jan 2024

Paper
Code

Efficient Multi-domain Text Recognition Deep Neural Network Parameterization with Residual Adapters

jiayou-chao/multi-domain-ocr • • 1 Jan 2024

Recent advancements in deep neural networks have markedly enhanced the performance of computer vision tasks, yet the specialized nature of these networks often necessitates extensive data and high computational power.

01 Jan 2024

Paper
Code

An Empirical Study of Scaling Law for OCR

large-ocr-model/large-ocr-model.github.io • 29 Dec 2023

The laws of model size, data volume, computation and model performance have been extensively studied in the field of Natural Language Processing (NLP).

107

29 Dec 2023

Paper
Code

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models

Ucas-HaoranWei/Vary • • 11 Dec 2023

Accordingly, we propose Vary, an efficient and effective method to scale up the vision vocabulary of LVLMs.

1,563

11 Dec 2023

Paper
Code

Optical Character Recognition (OCR)

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result