Document Classification

206 papers with code • 19 benchmarks • 15 datasets

Document Classification is a procedure of assigning one or more labels to a document from a predetermined set of labels.

Source: Long-length Legal Document Classification

Benchmarks

Add a Result

These leaderboards are used to track progress in Document Classification

Dataset	Best Model	Compare
Reuters-21578	MPAD-path	See all
Cora	ACNet	See all
HOC	BioLinkBERT (large)	See all
BBCSport	MPAD-path	See all
Amazon	ApproxRepSet	See all
Twitter	ApproxRepSet	See all
WOS-5736	ConvTextTM	See all
IMDb-M	Document Classification Using Importance of Sentences	See all
AAPD	KD-LSTMreg	See all
Classic	REL-RWMD k-NN	See all
Recipe	ApproxRepSet	See all
SciDocs (MAG)	SciNCL	See all
SciDocs (MeSH)	SciNCL	See all
WOS-11967	RMDL (30 RDLs)	See all
WOS-46985	RMDL (30 RDLs)	See all
Yelp-14	KD-LSTMreg	See all
Reuters En-De	BilBOWA	See all
Reuters De-En	BilBOWA	See all
MPQA	MPAD-path	See all

Show all 19 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Document Classification models and implementations

huggingface/transformers

2 papers

124,527

sergioburdisso/pyss3

2 papers

331

eske/multivec

2 papers

116

IllinoisGraphBenchmark/IGB-Datasets

2 papers

See all 6 libraries.

Datasets

Subtasks

Page Stream Segmentation

Most implemented papers

Most implemented Social Latest No code

Geometric deep learning on graphs and manifolds using mixture model CNNs

dmlc/dgl • • CVPR 2017

Recently, there has been an increasing interest in geometric deep learning, attempting to generalize deep learning methods to non-Euclidean structured data such as graphs and manifolds, with a variety of applications from the domains of network analysis, computational social science, or computer graphics.

Paper
Code

Learning to Skim Text

tsujuifu/pytorch_lstm-shuttle • • ACL 2017

Recurrent Neural Networks are showing much promise in many sub-areas of natural language processing, ranging from document classification to machine translation to automatic question answering.

Paper
Code

Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets

ncbi-nlp/NCBI_BERT • • WS 2019

Paper
Code

MultiFiT: Efficient Multi-lingual Language Model Fine-tuning

n-waves/multifit • IJCNLP 2019

Pretrained language models are promising particularly for low-resource languages as they only require unlabelled data.

Paper
Code

Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles

malteos/semantic-document-relations • • 22 Mar 2020

In this paper, we model the problem of finding the relationship between two documents as a pairwise document classification task.

Paper
Code

HDLTex: Hierarchical Deep Learning for Text Classification

kk7nc/HDLTex • • 24 Sep 2017

This is because along with this growth in the number of documents has come an increase in the number of categories.

Paper
Code

Combining Similarity Features and Deep Representation Learning for Stance Detection in the Context of Checking Fake News

LuisPB7/fnc-msc • 2 Nov 2018

Specifically, we use bi-directional Recurrent Neural Networks, together with max-pooling over the temporal/sequential dimension and neural attention, for representing (i) the headline, (ii) the first two sentences of the news article, and (iii) the entire news article.

Paper
Code

DocBERT: BERT for Document Classification

castorini/hedwig • • 17 Apr 2019

We present, to our knowledge, the first application of BERT to document classification.

Paper
Code

Multimodal deep networks for text and image-based document classification

Quicksign/ocrized-text-dataset • 15 Jul 2019

Classification of document images is a critical step for archival of old manuscripts, online subscription and administrative procedures.

Paper
Code

Hierarchical Transformers for Long Document Classification

helmy-elrais/RoBERT_Recurrence_over_BERT • • 23 Oct 2019

BERT, which stands for Bidirectional Encoder Representations from Transformers, is a recently introduced language representation model based upon the transfer learning paradigm.

Paper
Code

Document Classification

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result