Document Image Classification

24 papers with code • 8 benchmarks • 4 datasets

Document image classification is the task of classifying documents based on images of their contents.

( Image credit: Real-Time Document Image Classification using Deep CNN and Extreme Learning Machines )

Benchmarks

Add a Result

These leaderboards are used to track progress in Document Image Classification

Dataset	Best Model	Compare
RVL-CDIP	EAML	See all
Tobacco-3482	DocXClassifier-L	See all
Noisy Bangla Numeral	PCGAN-CHAR	See all
Noisy Bangla Characters	PCGAN-CHAR	See all
n-MNIST	PCGAN-CHAR	See all
Noisy MNIST	PCGAN-CHAR	See all
AIP	ResNet-RS (ResNet-200 + RS training tricks)	See all
SUT	CNN	See all

Libraries

Use these libraries to find Document Image Classification models and implementations

huggingface/transformers

10 papers

125,385

rwightman/pytorch-image-models

4 papers

29,846

facebookresearch/data2vec_vision

4 papers

PaddlePaddle/PaddleOCR

3 papers

38,644

See all 13 libraries.

Datasets

Most implemented papers

Most implemented Social Latest No code

Revisiting ResNets: Improved Training and Scaling Strategies

tensorflow/tpu • • NeurIPS 2021

Using improved training and scaling strategies, we design a family of ResNet architectures, ResNet-RS, which are 1. 7x - 2. 7x faster than EfficientNets on TPUs, while achieving similar accuracies on ImageNet.

Paper
Code

DiT: Self-supervised Pre-training for Document Image Transformer

microsoft/unilm • • 4 Mar 2022

We leverage DiT as the backbone network in a variety of vision-based Document AI tasks, including document image classification, document layout analysis, table detection as well as text detection for OCR.

Paper
Code

LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding

jpwang/lilt • • ACL 2022

LiLT can be pre-trained on the structured documents of a single language and then directly fine-tuned on other languages with the corresponding off-the-shelf monolingual/multilingual pre-trained textual models.

Paper
Code

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

microsoft/unilm • • 18 Apr 2022

In this paper, we propose \textbf{LayoutLMv3} to pre-train multimodal Transformers for Document AI with unified text and image masking.

Paper
Code

ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding

PaddlePaddle/PaddleNLP • • 12 Oct 2022

Recent years have witnessed the rise and success of pre-training techniques in visually-rich document understanding.

Paper
Code

Light-Weighted CNN for Text Classification

RituYadav92/Lightweighted-CNN-for-Document-Classification • • 16 Apr 2020

As a solution to this problem, we introduced a whole new architecture based on separable convolution.

Paper
Code

Improving accuracy and speeding up Document Image Classification through parallel systems

javiferran/document-classification • • 16 Jun 2020

This paper presents a study showing the benefits of the EfficientNet models compared with heavier Convolutional Neural Networks (CNNs) in the Document Classification task, essential problem in the digitalization process of institutions.

Paper
Code

Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer

uakarsh/TiLT-Implementation • • 18 Feb 2021

We address the challenging problem of Natural Language Comprehension beyond plain-text documents by introducing the TILT neural network architecture which simultaneously learns layout information, visual features, and textual semantics.

Paper
Code