Optical Character Recognition (OCR)

313 papers with code • 5 benchmarks • 42 datasets

Optical Character Recognition or Optical Character Reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo, license plates in cars...) or from subtitle text superimposed on an image (for example: from a television broadcast)

Libraries

Use these libraries to find Optical Character Recognition (OCR) models and implementations

Adversarial Training with OCR Modality Perturbation for Scene-Text Visual Question Answering

FrankZxShen/ATS 14 Mar 2024

Scene-Text Visual Question Answering (ST-VQA) aims to understand scene text in images and answer questions related to the text content.

3
14 Mar 2024

DeepSeek-VL: Towards Real-World Vision-Language Understanding

deepseek-ai/deepseek-vl 8 Mar 2024

The DeepSeek-VL family (both 1. 3B and 7B models) showcases superior user experiences as a vision-language chatbot in real-world applications, achieving state-of-the-art or competitive performance across a wide range of visual-language benchmarks at the same model size while maintaining robust performance on language-centric benchmarks.

1,525
08 Mar 2024

TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document

yuliang-liu/monkey 7 Mar 2024

We present TextMonkey, a large multimodal model (LMM) tailored for text-centric tasks.

1,397
07 Mar 2024

Syntactic Language Change in English and German: Metrics, Parsers, and Convergences

cyr19/syntaxchange 18 Feb 2024

Even though we have evidence that recent parsers trained on modern treebanks are not heavily affected by data 'noise' such as spelling changes and OCR errors in our historic data, we find that results of syntactic language change are sensitive to the parsers involved, which is a caution against using a single parser for evaluating syntactic language change as done in previous work.

1
18 Feb 2024

TEXTRON: Weakly Supervised Multilingual Text Detection through Data Programming

IITB-LEAP-OCR/TEXTRON IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024

In order to solve this problem, we propose TEXTRON, a Data Programming-based approach, where users can plug various text detection methods into a weak supervision-based learning framework.

7
15 Feb 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

alpha-vllm/llama2-accessory 8 Feb 2024

We propose SPHINX-X, an extensive Multimodality Large Language Model (MLLM) series developed upon SPHINX.

2,514
08 Feb 2024

MouSi: Poly-Visual-Expert Vision-Language Models

fudannlplab/mousi 30 Jan 2024

This technique introduces a fusion network to unify the processing of outputs from different visual experts, while bridging the gap between image encoders and pre-trained LLMs.

61
30 Jan 2024

Efficient Multi-domain Text Recognition Deep Neural Network Parameterization with Residual Adapters

jiayou-chao/multi-domain-ocr 1 Jan 2024

Recent advancements in deep neural networks have markedly enhanced the performance of computer vision tasks, yet the specialized nature of these networks often necessitates extensive data and high computational power.

3
01 Jan 2024

An Empirical Study of Scaling Law for OCR

large-ocr-model/large-ocr-model.github.io 29 Dec 2023

The laws of model size, data volume, computation and model performance have been extensively studied in the field of Natural Language Processing (NLP).

107
29 Dec 2023

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models

Ucas-HaoranWei/Vary 11 Dec 2023

Accordingly, we propose Vary, an efficient and effective method to scale up the vision vocabulary of LVLMs.

1,563
11 Dec 2023