Document Classification

210 papers with code • 19 benchmarks • 15 datasets

Document Classification is a procedure of assigning one or more labels to a document from a predetermined set of labels.

Source: Long-length Legal Document Classification

Benchmarks

Add a Result

These leaderboards are used to track progress in Document Classification

Dataset	Best Model	Compare
Reuters-21578	MPAD-path	See all
Cora	ACNet	See all
HOC	BioLinkBERT (large)	See all
BBCSport	MPAD-path	See all
Amazon	ApproxRepSet	See all
Twitter	ApproxRepSet	See all
WOS-5736	ConvTextTM	See all
IMDb-M	Document Classification Using Importance of Sentences	See all
AAPD	KD-LSTMreg	See all
Classic	REL-RWMD k-NN	See all
Recipe	ApproxRepSet	See all
SciDocs (MAG)	SciNCL	See all
SciDocs (MeSH)	SciNCL	See all
WOS-11967	RMDL (30 RDLs)	See all
WOS-46985	RMDL (30 RDLs)	See all
Yelp-14	KD-LSTMreg	See all
Reuters En-De	BilBOWA	See all
Reuters De-En	BilBOWA	See all
MPQA	MPAD-path	See all

Show all 19 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Document Classification models and implementations

huggingface/transformers

2 papers

125,425

sergioburdisso/pyss3

2 papers

331

eske/multivec

2 papers

116

IllinoisGraphBenchmark/IGB-Datasets

2 papers

See all 6 libraries.

Datasets

Subtasks

Page Stream Segmentation

Latest papers with no code

Most implemented Social Latest No code

Learning Section Weights for Multi-Label Document Classification

no code yet • 26 Nov 2023

This problem is crucially important in various domains, such as tagging scientific articles.

Paper
Add Code

Causality is all you need

no code yet • 21 Nov 2023

In this paper, we propose the Causal Graph Routing (CGR) framework, an integrated causal scheme relying entirely on the intervention mechanisms to reveal the cause-effect forces hidden in data.

Paper
Add Code

ATLANTIC: Structure-Aware Retrieval-Augmented Language Model for Interdisciplinary Science

no code yet • 21 Nov 2023

We train a graph neural network on the curated document graph to act as a structural encoder for the corresponding passages retrieved during the model pretraining.

Paper
Add Code

Explainable Text Classification Techniques in Legal Document Review: Locating Rationales without Using Human Annotated Training Text Snippets

no code yet • 15 Nov 2023

While interesting, manually annotating training text snippets is not generally practical during a legal document review.

Paper
Add Code

Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents

no code yet • 25 Oct 2023

The proposed model leverages transformer-based models to encode all the information present in a document image, including textual, visual, and layout information.

Paper
Add Code

A Multi-Modal Multilingual Benchmark for Document Image Classification

no code yet • 25 Oct 2023

Document image classification is different from plain-text document classification and consists of classifying a document by understanding the content and structure of documents such as forms, emails, and other such documents.

Paper
Add Code

An Analysis on Large Language Models in Healthcare: A Case Study of BioBERT

no code yet • 11 Oct 2023

This paper conducts a comprehensive investigation into applying large language models, particularly on BioBERT, in healthcare.

Paper
Add Code

KoBigBird-large: Transformation of Transformer for Korean Language Understanding

no code yet • 19 Sep 2023

This work presents KoBigBird-large, a large size of Korean BigBird that achieves state-of-the-art performance and allows long sequence processing for Korean language understanding.

Paper
Add Code

Feature Extraction Using Deep Generative Models for Bangla Text Classification on a New Comprehensive Dataset

no code yet • 21 Aug 2023

The selection of features for text classification is a fundamental task in text mining and information retrieval.

Paper
Add Code

Accelerated materials language processing enabled by GPT

no code yet • 18 Aug 2023

Finally, we develop an GPT-enabled extractive QA model, which provides improved performance and shows the possibility of automatically correcting annotations.

Paper
Add Code

Document Classification

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result