Document Classification
210 papers with code • 19 benchmarks • 15 datasets
Document Classification is a procedure of assigning one or more labels to a document from a predetermined set of labels.
Libraries
Use these libraries to find Document Classification models and implementationsDatasets
Latest papers with no code
Learning Section Weights for Multi-Label Document Classification
This problem is crucially important in various domains, such as tagging scientific articles.
Causality is all you need
In this paper, we propose the Causal Graph Routing (CGR) framework, an integrated causal scheme relying entirely on the intervention mechanisms to reveal the cause-effect forces hidden in data.
ATLANTIC: Structure-Aware Retrieval-Augmented Language Model for Interdisciplinary Science
We train a graph neural network on the curated document graph to act as a structural encoder for the corresponding passages retrieved during the model pretraining.
Explainable Text Classification Techniques in Legal Document Review: Locating Rationales without Using Human Annotated Training Text Snippets
While interesting, manually annotating training text snippets is not generally practical during a legal document review.
Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents
The proposed model leverages transformer-based models to encode all the information present in a document image, including textual, visual, and layout information.
A Multi-Modal Multilingual Benchmark for Document Image Classification
Document image classification is different from plain-text document classification and consists of classifying a document by understanding the content and structure of documents such as forms, emails, and other such documents.
An Analysis on Large Language Models in Healthcare: A Case Study of BioBERT
This paper conducts a comprehensive investigation into applying large language models, particularly on BioBERT, in healthcare.
KoBigBird-large: Transformation of Transformer for Korean Language Understanding
This work presents KoBigBird-large, a large size of Korean BigBird that achieves state-of-the-art performance and allows long sequence processing for Korean language understanding.
Feature Extraction Using Deep Generative Models for Bangla Text Classification on a New Comprehensive Dataset
The selection of features for text classification is a fundamental task in text mining and information retrieval.
Accelerated materials language processing enabled by GPT
Finally, we develop an GPT-enabled extractive QA model, which provides improved performance and shows the possibility of automatically correcting annotations.