Optical Character Recognition (OCR)

311 papers with code • 5 benchmarks • 42 datasets

Optical Character Recognition or Optical Character Reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo, license plates in cars...) or from subtitle text superimposed on an image (for example: from a television broadcast)

Libraries

Use these libraries to find Optical Character Recognition (OCR) models and implementations

Latest papers with no code

TC-OCR: TableCraft OCR for Efficient Detection & Recognition of Table Structure & Content

no code yet • 16 Apr 2024

Our proposed approach achieves an IOU of 0. 96 and an OCR Accuracy of 78%, showcasing a remarkable improvement of approximately 25% in the OCR Accuracy compared to the previous Table Transformer approach.

Resilience of Large Language Models for Noisy Instructions

no code yet • 15 Apr 2024

As the rapidly advancing domain of natural language processing (NLP), large language models (LLMs) have emerged as powerful tools for interpreting human commands and generating text across various tasks.

Convolution-based Probability Gradient Loss for Semantic Segmentation

no code yet • 10 Apr 2024

In this paper, we introduce a novel Convolution-based Probability Gradient (CPG) loss for semantic segmentation.

VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?

no code yet • 9 Apr 2024

Multimodal Large Language models (MLLMs) have shown promise in web-related tasks, but evaluating their performance in the web domain remains a challenge due to the lack of comprehensive benchmarks.

Making Old Kurdish Publications Processable by Augmenting Available Optical Character Recognition Engines

no code yet • 9 Apr 2024

Having an extensive dataset is crucial to develop OCR systems with reasonable accuracy, as currently, no public datasets are available for historical Kurdish documents; this posed a significant challenge in our work.

HAMMR: HierArchical MultiModal React agents for generic VQA

no code yet • 8 Apr 2024

We start from a multimodal ReAct-based system and make it hierarchical by enabling our HAMMR agents to call upon other specialized agents.

Design and Development of a Framework For Stroke-Based Handwritten Gujarati Font Generation

no code yet • 4 Apr 2024

The generation phase involves the user providing a small subset of characters, and the system automatically generates the remaining character glyphs based on extracted strokes and learned rules, resulting in handwritten Gujarati fonts.

Optical Text Recognition in Nepali and Bengali: A Transformer-based Approach

no code yet • 3 Apr 2024

Efforts on the research and development of OCR systems for Low-Resource Languages are relatively new.

RealKIE: Five Novel Datasets for Enterprise Key Information Extraction

no code yet • 29 Mar 2024

We introduce RealKIE, a benchmark of five challenging datasets aimed at advancing key information extraction methods, with an emphasis on enterprise applications.