Document Classification
209 papers with code • 19 benchmarks • 15 datasets
Document Classification is a procedure of assigning one or more labels to a document from a predetermined set of labels.
Libraries
Use these libraries to find Document Classification models and implementationsDatasets
Most implemented papers
Modular Multimodal Architecture for Document Classification
Page classification is a crucial component to any document analysis system, allowing for complex branching control flows for different components of a given document.
BilBOWA: Fast Bilingual Distributed Representations without Word Alignments
We introduce BilBOWA (Bilingual Bag-of-Words without Alignments), a simple and computationally-efficient model for learning bilingual distributed representations of words which can scale to large monolingual datasets and does not require word-aligned parallel training data.
Naive Bayes and Text Classification I - Introduction and Theory
Naive Bayes classifiers, a family of classifiers that are based on the popular Bayes' probability theorem, are known for creating simple yet well performing models, especially in the fields of document classification and disease prediction.
Multi-layer Representation Learning for Medical Concepts
Learning efficient representations for concepts has been proven to be an important basis for many applications such as machine translation or document classification.
WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia
The task contains a rich variety of challenging classification and extraction sub-tasks, making it well-suited for end-to-end models such as deep neural networks (DNNs).
Multilingual Hierarchical Attention Networks for Document Classification
Hierarchical attention networks have recently achieved remarkable performance for document classification in a given language.
Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms
Many deep learning architectures have been proposed to model the compositionality in text sequences, requiring a substantial number of parameters and expensive computations.
A Corpus for Multilingual Document Classification in Eight Languages
In addition, we have observed that the class prior distributions differ significantly between the languages.
Multiview Boosting by Controlling the Diversity and the Accuracy of View-specific Voters
Different experiments on three publicly available datasets show the efficiency of the proposed approach with respect to state-of-art models.
Auto-Encoding Dictionary Definitions into Consistent Word Embeddings
Monolingual dictionaries are widespread and semantically rich resources.