Document Classification

209 papers with code • 19 benchmarks • 15 datasets

Document Classification is a procedure of assigning one or more labels to a document from a predetermined set of labels.

Source: Long-length Legal Document Classification

Libraries

Use these libraries to find Document Classification models and implementations

Most implemented papers

Modular Multimodal Architecture for Document Classification

microsoft/unilm 9 Dec 2019

Page classification is a crucial component to any document analysis system, allowing for complex branching control flows for different components of a given document.

BilBOWA: Fast Bilingual Distributed Representations without Word Alignments

gouwsmeister/bilbowa 9 Oct 2014

We introduce BilBOWA (Bilingual Bag-of-Words without Alignments), a simple and computationally-efficient model for learning bilingual distributed representations of words which can scale to large monolingual datasets and does not require word-aligned parallel training data.

Naive Bayes and Text Classification I - Introduction and Theory

Xue-Alex/sentiment-analysis 16 Oct 2014

Naive Bayes classifiers, a family of classifiers that are based on the popular Bayes' probability theorem, are known for creating simple yet well performing models, especially in the fields of document classification and disease prediction.

Multi-layer Representation Learning for Medical Concepts

mp2893/med2vec 17 Feb 2016

Learning efficient representations for concepts has been proven to be an important basis for many applications such as machine translation or document classification.

WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia

google-research-datasets/wiki-reading ACL 2016

The task contains a rich variety of challenging classification and extraction sub-tasks, making it well-suited for end-to-end models such as deep neural networks (DNNs).

Multilingual Hierarchical Attention Networks for Document Classification

idiap/mhan IJCNLP 2017

Hierarchical attention networks have recently achieved remarkable performance for document classification in a given language.

Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms

dinghanshen/SWEM ACL 2018

Many deep learning architectures have been proposed to model the compositionality in text sequences, requiring a substantial number of parameters and expensive computations.

A Corpus for Multilingual Document Classification in Eight Languages

facebookresearch/MLDoc LREC 2018

In addition, we have observed that the class prior distributions differ significantly between the languages.

Multiview Boosting by Controlling the Diversity and the Accuracy of View-specific Voters

goyalanil/PB-MVBoost 17 Aug 2018

Different experiments on three publicly available datasets show the efficiency of the proposed approach with respect to state-of-art models.

Auto-Encoding Dictionary Definitions into Consistent Word Embeddings

tombosc/cpae EMNLP 2018

Monolingual dictionaries are widespread and semantically rich resources.