Document Classification

210 papers with code • 19 benchmarks • 15 datasets

Document Classification is a procedure of assigning one or more labels to a document from a predetermined set of labels.

Source: Long-length Legal Document Classification

Libraries

Use these libraries to find Document Classification models and implementations

Latest papers with no code

Learning Section Weights for Multi-Label Document Classification

no code yet • 26 Nov 2023

This problem is crucially important in various domains, such as tagging scientific articles.

Causality is all you need

no code yet • 21 Nov 2023

In this paper, we propose the Causal Graph Routing (CGR) framework, an integrated causal scheme relying entirely on the intervention mechanisms to reveal the cause-effect forces hidden in data.

ATLANTIC: Structure-Aware Retrieval-Augmented Language Model for Interdisciplinary Science

no code yet • 21 Nov 2023

We train a graph neural network on the curated document graph to act as a structural encoder for the corresponding passages retrieved during the model pretraining.

Explainable Text Classification Techniques in Legal Document Review: Locating Rationales without Using Human Annotated Training Text Snippets

no code yet • 15 Nov 2023

While interesting, manually annotating training text snippets is not generally practical during a legal document review.

Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents

no code yet • 25 Oct 2023

The proposed model leverages transformer-based models to encode all the information present in a document image, including textual, visual, and layout information.

A Multi-Modal Multilingual Benchmark for Document Image Classification

no code yet • 25 Oct 2023

Document image classification is different from plain-text document classification and consists of classifying a document by understanding the content and structure of documents such as forms, emails, and other such documents.

An Analysis on Large Language Models in Healthcare: A Case Study of BioBERT

no code yet • 11 Oct 2023

This paper conducts a comprehensive investigation into applying large language models, particularly on BioBERT, in healthcare.

KoBigBird-large: Transformation of Transformer for Korean Language Understanding

no code yet • 19 Sep 2023

This work presents KoBigBird-large, a large size of Korean BigBird that achieves state-of-the-art performance and allows long sequence processing for Korean language understanding.

Feature Extraction Using Deep Generative Models for Bangla Text Classification on a New Comprehensive Dataset

no code yet • 21 Aug 2023

The selection of features for text classification is a fundamental task in text mining and information retrieval.

Accelerated materials language processing enabled by GPT

no code yet • 18 Aug 2023

Finally, we develop an GPT-enabled extractive QA model, which provides improved performance and shows the possibility of automatically correcting annotations.