Optical Character Recognition (OCR)
313 papers with code • 5 benchmarks • 42 datasets
Optical Character Recognition or Optical Character Reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo, license plates in cars...) or from subtitle text superimposed on an image (for example: from a television broadcast)
Libraries
Use these libraries to find Optical Character Recognition (OCR) models and implementationsSubtasks
Latest papers with no code
Optical Text Recognition in Nepali and Bengali: A Transformer-based Approach
Efforts on the research and development of OCR systems for Low-Resource Languages are relatively new.
RealKIE: Five Novel Datasets for Enterprise Key Information Extraction
We introduce RealKIE, a benchmark of five challenging datasets aimed at advancing key information extraction methods, with an emphasis on enterprise applications.
SciCapenter: Supporting Caption Composition for Scientific Figures with Machine-Generated Captions and Ratings
Crafting effective captions for figures is important.
The Solution for the ICCV 2023 1st Scientific Figure Captioning Challenge
In this paper, we propose a solution for improving the quality of captions generated for figures in papers.
Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation
We introduce a benchmark, LenCom-Eval, specifically designed for testing models' capability in generating images with Lengthy and Complex visual text.
Grammatical vs Spelling Error Correction: An Investigation into the Responsiveness of Transformer-based Language Models using BART and MarianMT
Text continues to remain a relevant form of representation for information.
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
We propose a technique to transfer capabilities from LLMs to VLMs.
OCR is All you need: Importing Multi-Modality into Image-based Defect Detection System
To address this, we introduce an external modality-guided data mining framework, primarily rooted in optical character recognition (OCR), to extract statistical features from images as a second modality to enhance performance, termed OANet (Ocr-Aoi-Net).
Advanced Knowledge Extraction of Physical Design Drawings, Translation and conversion to CAD formats using Deep Learning
The approach employs object detection model, such as Yolov7, Faster R-CNN, to detect physical drawing objects present in the images followed by, edge detection algorithms such as canny filter to extract and refine the identified lines from the drawing region and curve detection techniques to detect circle.
TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model
Taking advantage of the fine-tuned language model on scene recognition benchmarks and the paradigm of text block detection, extensive experiments demonstrate the superior performance of our scene text spotter across multiple public benchmarks.