Text Classification
1107 papers with code • 93 benchmarks • 136 datasets
Text Classification is the task of assigning a sentence or document an appropriate category. The categories depend on the chosen dataset and can range from topics.
Text Classification problems include emotion classification, news classification, citation intent classification, among others. Benchmark datasets for evaluating text classification capabilities include GLUE, AGNews, among others.
In recent years, deep learning techniques like XLNet and RoBERTa have attained some of the biggest performance jumps for text classification problems.
( Image credit: Text Classification Algorithms: A Survey )
Libraries
Use these libraries to find Text Classification models and implementationsSubtasks
- Topic Models
- Document Classification
- Sentence Classification
- Emotion Classification
- Emotion Classification
- Multi-Label Text Classification
- Few-Shot Text Classification
- Text Categorization
- Semi-Supervised Text Classification
- Coherence Evaluation
- Toxic Comment Classification
- Citation Intent Classification
- Cross-Domain Text Classification
- Unsupervised Text Classification
- Satire Detection
- Hierarchical Text Classification of Blurbs (GermEval 2019)
- Variable Detection
Most implemented papers
Big Bird: Transformers for Longer Sequences
To remedy this, we propose, BigBird, a sparse attention mechanism that reduces this quadratic dependency to linear.
A C-LSTM Neural Network for Text Classification
In this work, we combine the strengths of both architectures and propose a novel and unified model called C-LSTM for sentence representation and text classification.
PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts
First, the majority of datasets for sequential short-text classification (i. e., classification of short texts that appear in sequences) are small: we hope that releasing a new large dataset will help develop more accurate algorithms for this task.
Graph Convolutional Networks for Text Classification
We build a single text graph for a corpus based on word co-occurrence and document word relations, then learn a Text Graph Convolutional Network (Text GCN) for the corpus.
Fastformer: Additive Attention Can Be All You Need
In this way, Fastformer can achieve effective context modeling with linear complexity.
Simplifying Graph Convolutional Networks
Graph Convolutional Networks (GCNs) and their variants have experienced significant attention and have become the de facto methods for learning graph representations.
FlauBERT: Unsupervised Language Model Pre-training for French
Language models have become a key step to achieve state-of-the art results in many different Natural Language Processing (NLP) tasks.
Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment
Machine learning algorithms are often vulnerable to adversarial examples that have imperceptible alterations from the original counterparts but can fool the state-of-the-art models.
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
Some NLP tasks can be solved in a fully unsupervised fashion by providing a pretrained language model with "task descriptions" in natural language (e. g., Radford et al., 2019).
HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection
We also observe that models, which utilize the human rationales for training, perform better in reducing unintended bias towards target communities.