TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Emotion Classification	SemEval 2018 Task 1E-c	Transformer (finetune)	Macro-F1	0.561	# 3
Sentiment Analysis	SST-2 Binary classification	Transformer (finetune)	Accuracy	90.9	# 59

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/practical-text-classification-with-large-pre/emotion-classification-on-semeval-2018-task)](https://paperswithcode.com/sota/emotion-classification-on-semeval-2018-task?p=practical-text-classification-with-large-pre)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/practical-text-classification-with-large-pre/sentiment-analysis-on-sst-2-binary)](https://paperswithcode.com/sota/sentiment-analysis-on-sst-2-binary?p=practical-text-classification-with-large-pre)`

Practical Text Classification With Large Pre-Trained Language Models

4 Dec 2018 · Neel Kant, Raul Puri, Nikolai Yakovenko, Bryan Catanzaro ·

Multi-emotion sentiment classification is a natural language processing (NLP) problem with valuable use cases on real-world data. We demonstrate that large-scale unsupervised language modeling combined with finetuning offers a practical solution to this task on difficult datasets, including those with label class imbalance and domain-specific context. By training an attention-based Transformer network (Vaswani et al. 2017) on 40GB of text (Amazon reviews) (McAuley et al. 2015) and fine-tuning on the training set, our model achieves a 0.69 F1 score on the SemEval Task 1:E-c multi-dimensional emotion classification problem (Mohammad et al. 2018), based on the Plutchik wheel of emotions (Plutchik 1979). These results are competitive with state of the art models, including strong F1 scores on difficult (emotion) categories such as Fear (0.73), Disgust (0.77) and Anger (0.78), as well as competitive results on rare categories such as Anticipation (0.42) and Surprise (0.37). Furthermore, we demonstrate our application on a real world text classification task. We create a narrowly collected text dataset of real tweets on several topics, and show that our finetuned model outperforms general purpose commercially available APIs for sentiment and multidimensional emotion classification on this dataset by a significant margin. We also perform a variety of additional studies, investigating properties of deep learning architectures, datasets and algorithms for achieving practical multidimensional sentiment classification. Overall, we find that unsupervised language modeling and finetuning is a simple framework for achieving high quality results on real-world sentiment classification.

PDF Abstract

Code

Add Remove Mark official

NVIDIA/sentiment-discovery

1,057

Tasks

Add Remove

Classification

Emotion Classification

General Classification

Language Modelling

Sentiment Analysis

Sentiment Classification

text-classification

Text Classification

Datasets

GLUE

SST SST-2

Results from the Paper

Edit

Ranked #3 on Emotion Classification on SemEval 2018 Task 1E-c (Macro-F1 metric)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Emotion Classification	SemEval 2018 Task 1E-c	Transformer (finetune)	Macro-F1	0.561	# 3		Compare
Sentiment Analysis	SST-2 Binary classification	Transformer (finetune)	Accuracy	90.9	# 59		Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • ReLU • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Practical Text Classification With Large Pre-Trained Language Models

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove