Document Classification

209 papers with code • 19 benchmarks • 15 datasets

Document Classification is a procedure of assigning one or more labels to a document from a predetermined set of labels.

Source: Long-length Legal Document Classification

Benchmarks

Add a Result

These leaderboards are used to track progress in Document Classification

Dataset	Best Model	Compare
Reuters-21578	MPAD-path	See all
Cora	ACNet	See all
HOC	BioLinkBERT (large)	See all
BBCSport	MPAD-path	See all
Amazon	ApproxRepSet	See all
Twitter	ApproxRepSet	See all
WOS-5736	ConvTextTM	See all
IMDb-M	Document Classification Using Importance of Sentences	See all
AAPD	KD-LSTMreg	See all
Classic	REL-RWMD k-NN	See all
Recipe	ApproxRepSet	See all
SciDocs (MAG)	SciNCL	See all
SciDocs (MeSH)	SciNCL	See all
WOS-11967	RMDL (30 RDLs)	See all
WOS-46985	RMDL (30 RDLs)	See all
Yelp-14	KD-LSTMreg	See all
Reuters En-De	BilBOWA	See all
Reuters De-En	BilBOWA	See all
MPQA	MPAD-path	See all

Show all 19 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Document Classification models and implementations

huggingface/transformers

2 papers

125,334

sergioburdisso/pyss3

2 papers

331

eske/multivec

2 papers

116

IllinoisGraphBenchmark/IGB-Datasets

2 papers

See all 6 libraries.

Datasets

Subtasks

Page Stream Segmentation

Most implemented papers

Most implemented Social Latest No code

Modular Multimodal Architecture for Document Classification

microsoft/unilm • • 9 Dec 2019

Page classification is a crucial component to any document analysis system, allowing for complex branching control flows for different components of a given document.

Paper
Code

BilBOWA: Fast Bilingual Distributed Representations without Word Alignments

gouwsmeister/bilbowa • 9 Oct 2014

We introduce BilBOWA (Bilingual Bag-of-Words without Alignments), a simple and computationally-efficient model for learning bilingual distributed representations of words which can scale to large monolingual datasets and does not require word-aligned parallel training data.

Paper
Code

Naive Bayes and Text Classification I - Introduction and Theory

Xue-Alex/sentiment-analysis • 16 Oct 2014

Naive Bayes classifiers, a family of classifiers that are based on the popular Bayes' probability theorem, are known for creating simple yet well performing models, especially in the fields of document classification and disease prediction.

Paper
Code

Multi-layer Representation Learning for Medical Concepts

mp2893/med2vec • 17 Feb 2016

Learning efficient representations for concepts has been proven to be an important basis for many applications such as machine translation or document classification.

Paper
Code

WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia

google-research-datasets/wiki-reading • • ACL 2016

The task contains a rich variety of challenging classification and extraction sub-tasks, making it well-suited for end-to-end models such as deep neural networks (DNNs).

Paper
Code

Multilingual Hierarchical Attention Networks for Document Classification

idiap/mhan • IJCNLP 2017

Hierarchical attention networks have recently achieved remarkable performance for document classification in a given language.

Paper
Code

Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms

dinghanshen/SWEM • • ACL 2018

Many deep learning architectures have been proposed to model the compositionality in text sequences, requiring a substantial number of parameters and expensive computations.

Paper
Code