TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Text Classification	AG News	BERT-ITPT-FiT	Error	4.8	# 2
Text Classification	DBpedia	BERT-ITPT-FiT	Error	0.68	# 3
Sentiment Analysis	IMDb	BERT_large+ITPT	Accuracy	95.79	# 10
Sentiment Analysis	IMDb	BERT_base+ITPT	Accuracy	95.63	# 13
Text Classification	Sogou News	BERT-ITPT-FiT	Accuracy	98.07	# 1
Text Classification	TREC-6	BERT-ITPT-FiT	Error	3.2	# 4
Text Classification	Yahoo! Answers	BERT-ITPT-FiT	Accuracy	77.62	# 1
Text Classification	Yelp-2	BERT-ITPT-FiT	Accuracy	98.08%	# 2
Text Classification	Yelp-5	BERT-ITPT-FiT	Accuracy	70.58%	# 4
Sentiment Analysis	Yelp Binary classification	BERT_base+ITPT	Error	1.92	# 5
Sentiment Analysis	Yelp Binary classification	BERT_large+ITPT	Error	1.81	# 2
Sentiment Analysis	Yelp Fine-grained classification	BERT_large+ITPT	Error	28.62	# 2
Sentiment Analysis	Yelp Fine-grained classification	BERT_base+ITPT	Error	29.42	# 4

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/how-to-fine-tune-bert-for-text-classification/text-classification-on-sogou-news)](https://paperswithcode.com/sota/text-classification-on-sogou-news?p=how-to-fine-tune-bert-for-text-classification)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/how-to-fine-tune-bert-for-text-classification/text-classification-on-yahoo-answers)](https://paperswithcode.com/sota/text-classification-on-yahoo-answers?p=how-to-fine-tune-bert-for-text-classification)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/how-to-fine-tune-bert-for-text-classification/text-classification-on-ag-news)](https://paperswithcode.com/sota/text-classification-on-ag-news?p=how-to-fine-tune-bert-for-text-classification)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/how-to-fine-tune-bert-for-text-classification/text-classification-on-yelp-2)](https://paperswithcode.com/sota/text-classification-on-yelp-2?p=how-to-fine-tune-bert-for-text-classification)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/how-to-fine-tune-bert-for-text-classification/sentiment-analysis-on-yelp-binary)](https://paperswithcode.com/sota/sentiment-analysis-on-yelp-binary?p=how-to-fine-tune-bert-for-text-classification)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/how-to-fine-tune-bert-for-text-classification/sentiment-analysis-on-yelp-fine-grained)](https://paperswithcode.com/sota/sentiment-analysis-on-yelp-fine-grained?p=how-to-fine-tune-bert-for-text-classification)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/how-to-fine-tune-bert-for-text-classification/text-classification-on-dbpedia)](https://paperswithcode.com/sota/text-classification-on-dbpedia?p=how-to-fine-tune-bert-for-text-classification)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/how-to-fine-tune-bert-for-text-classification/text-classification-on-trec-6)](https://paperswithcode.com/sota/text-classification-on-trec-6?p=how-to-fine-tune-bert-for-text-classification)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/how-to-fine-tune-bert-for-text-classification/text-classification-on-yelp-5)](https://paperswithcode.com/sota/text-classification-on-yelp-5?p=how-to-fine-tune-bert-for-text-classification)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/how-to-fine-tune-bert-for-text-classification/sentiment-analysis-on-imdb)](https://paperswithcode.com/sota/sentiment-analysis-on-imdb?p=how-to-fine-tune-bert-for-text-classification)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/how-to-fine-tune-bert-for-text-classification/text-classification-on-imdb)](https://paperswithcode.com/sota/text-classification-on-imdb?p=how-to-fine-tune-bert-for-text-classification)`

How to Fine-Tune BERT for Text Classification?

14 May 2019 · Chi Sun, Xipeng Qiu, Yige Xu, Xuanjing Huang ·

Language model pre-training has proven to be useful in learning universal language representations. As a state-of-the-art language model pre-training model, BERT (Bidirectional Encoder Representations from Transformers) has achieved amazing results in many language understanding tasks. In this paper, we conduct exhaustive experiments to investigate different fine-tuning methods of BERT on text classification task and provide a general solution for BERT fine-tuning. Finally, the proposed solution obtains new state-of-the-art results on eight widely-studied text classification datasets.

PDF Abstract

Code

Add Remove Mark official

xuyige/BERT4doc-Classification official

584

ongunuzaymacar/comparatively-finetu…

115

uzaymacar/comparatively-finetuning-…

115

helmy-elrais/RoBERT_Recurrence_over…

GeorgeLuImmortal/Hierarchical-BERT-…

See all 16 implementations

Tasks

Add Remove

General Classification

Language Modelling

Sentiment Analysis

Text Classification

Datasets

IMDb Movie Reviews

AG News

DBpedia Yahoo! Answers Yelp

Results from the Paper

Edit

Ranked #1 on Text Classification on Yahoo! Answers

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Text Classification	AG News	BERT-ITPT-FiT	Error	4.8	# 2	Compare
Text Classification	DBpedia	BERT-ITPT-FiT	Error	0.68	# 3	Compare
Sentiment Analysis	IMDb	BERT_large+ITPT	Accuracy	95.79	# 10	Compare
Sentiment Analysis	IMDb	BERT_base+ITPT	Accuracy	95.63	# 13	Compare
Text Classification	Sogou News	BERT-ITPT-FiT	Accuracy	98.07	# 1	Compare
Text Classification	TREC-6	BERT-ITPT-FiT	Error	3.2	# 4	Compare
Text Classification	Yahoo! Answers	BERT-ITPT-FiT	Accuracy	77.62	# 1	Compare
Text Classification	Yelp-2	BERT-ITPT-FiT	Accuracy	98.08%	# 2	Compare
Text Classification	Yelp-5	BERT-ITPT-FiT	Accuracy	70.58%	# 4	Compare
Sentiment Analysis	Yelp Binary classification	BERT_base+ITPT	Error	1.92	# 5	Compare
Sentiment Analysis	Yelp Binary classification	BERT_large+ITPT	Error	1.81	# 2	Compare
Sentiment Analysis	Yelp Fine-grained classification	BERT_large+ITPT	Error	28.62	# 2	Compare
Sentiment Analysis	Yelp Fine-grained classification	BERT_base+ITPT	Error	29.42	# 4	Compare

Methods

Add Remove

Adam • Attention Dropout • BERT • Dense Connections • Dropout • GELU • Layer Normalization • Linear Layer • Linear Warmup With Linear Decay • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Slanted Triangular Learning Rates • Softmax • Weight Decay • WordPiece

Edit Social Preview

How to Fine-Tune BERT for Text Classification?

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove