Optical Character Recognition (OCR)

313 papers with code • 5 benchmarks • 42 datasets

Optical Character Recognition or Optical Character Reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo, license plates in cars...) or from subtitle text superimposed on an image (for example: from a television broadcast)

Libraries

Use these libraries to find Optical Character Recognition (OCR) models and implementations

Latest papers with no code

Optical Text Recognition in Nepali and Bengali: A Transformer-based Approach

no code yet • 3 Apr 2024

Efforts on the research and development of OCR systems for Low-Resource Languages are relatively new.

RealKIE: Five Novel Datasets for Enterprise Key Information Extraction

no code yet • 29 Mar 2024

We introduce RealKIE, a benchmark of five challenging datasets aimed at advancing key information extraction methods, with an emphasis on enterprise applications.

The Solution for the ICCV 2023 1st Scientific Figure Captioning Challenge

no code yet • 26 Mar 2024

In this paper, we propose a solution for improving the quality of captions generated for figures in papers.

Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation

no code yet • 25 Mar 2024

We introduce a benchmark, LenCom-Eval, specifically designed for testing models' capability in generating images with Lengthy and Complex visual text.

Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

no code yet • 19 Mar 2024

We propose a technique to transfer capabilities from LLMs to VLMs.

OCR is All you need: Importing Multi-Modality into Image-based Defect Detection System

no code yet • 18 Mar 2024

To address this, we introduce an external modality-guided data mining framework, primarily rooted in optical character recognition (OCR), to extract statistical features from images as a second modality to enhance performance, termed OANet (Ocr-Aoi-Net).

Advanced Knowledge Extraction of Physical Design Drawings, Translation and conversion to CAD formats using Deep Learning

no code yet • 17 Mar 2024

The approach employs object detection model, such as Yolov7, Faster R-CNN, to detect physical drawing objects present in the images followed by, edge detection algorithms such as canny filter to extract and refine the identified lines from the drawing region and curve detection techniques to detect circle.

TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model

no code yet • 15 Mar 2024

Taking advantage of the fine-tuned language model on scene recognition benchmarks and the paradigm of text block detection, extensive experiments demonstrate the superior performance of our scene text spotter across multiple public benchmarks.