Document Image Classification
24 papers with code • 8 benchmarks • 4 datasets
Document image classification is the task of classifying documents based on images of their contents.
( Image credit: Real-Time Document Image Classification using Deep CNN and Extreme Learning Machines )
Libraries
Use these libraries to find Document Image Classification models and implementationsMost implemented papers
Revisiting ResNets: Improved Training and Scaling Strategies
Using improved training and scaling strategies, we design a family of ResNet architectures, ResNet-RS, which are 1. 7x - 2. 7x faster than EfficientNets on TPUs, while achieving similar accuracies on ImageNet.
DiT: Self-supervised Pre-training for Document Image Transformer
We leverage DiT as the backbone network in a variety of vision-based Document AI tasks, including document image classification, document layout analysis, table detection as well as text detection for OCR.
LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding
LiLT can be pre-trained on the structured documents of a single language and then directly fine-tuned on other languages with the corresponding off-the-shelf monolingual/multilingual pre-trained textual models.
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
In this paper, we propose \textbf{LayoutLMv3} to pre-train multimodal Transformers for Document AI with unified text and image masking.
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding
Recent years have witnessed the rise and success of pre-training techniques in visually-rich document understanding.
Light-Weighted CNN for Text Classification
As a solution to this problem, we introduced a whole new architecture based on separable convolution.
Improving accuracy and speeding up Document Image Classification through parallel systems
This paper presents a study showing the benefits of the EfficientNet models compared with heavier Convolutional Neural Networks (CNNs) in the Document Classification task, essential problem in the digitalization process of institutions.
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
We address the challenging problem of Natural Language Comprehension beyond plain-text documents by introducing the TILT neural network architecture which simultaneously learns layout information, visual features, and textual semantics.
StructuralLM: Structural Pre-training for Form Understanding
Large pre-trained language models achieve state-of-the-art results when fine-tuned on downstream NLP tasks.
DocFormer: End-to-End Transformer for Document Understanding
DocFormer uses text, vision and spatial features and combines them using a novel multi-modal self-attention layer.