How to Fine-Tune BERT for Text Classification?

14 May 2019  ·  Chi Sun, Xipeng Qiu, Yige Xu, Xuanjing Huang ·

Language model pre-training has proven to be useful in learning universal language representations. As a state-of-the-art language model pre-training model, BERT (Bidirectional Encoder Representations from Transformers) has achieved amazing results in many language understanding tasks. In this paper, we conduct exhaustive experiments to investigate different fine-tuning methods of BERT on text classification task and provide a general solution for BERT fine-tuning. Finally, the proposed solution obtains new state-of-the-art results on eight widely-studied text classification datasets.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Text Classification AG News BERT-ITPT-FiT Error 4.8 # 2
Text Classification DBpedia BERT-ITPT-FiT Error 0.68 # 3
Sentiment Analysis IMDb BERT_large+ITPT Accuracy 95.79 # 10
Sentiment Analysis IMDb BERT_base+ITPT Accuracy 95.63 # 13
Text Classification Sogou News BERT-ITPT-FiT Accuracy 98.07 # 1
Text Classification TREC-6 BERT-ITPT-FiT Error 3.2 # 4
Text Classification Yahoo! Answers BERT-ITPT-FiT Accuracy 77.62 # 1
Text Classification Yelp-2 BERT-ITPT-FiT Accuracy 98.08% # 2
Text Classification Yelp-5 BERT-ITPT-FiT Accuracy 70.58% # 4
Sentiment Analysis Yelp Binary classification BERT_base+ITPT Error 1.92 # 5
Sentiment Analysis Yelp Binary classification BERT_large+ITPT Error 1.81 # 2
Sentiment Analysis Yelp Fine-grained classification BERT_large+ITPT Error 28.62 # 2
Sentiment Analysis Yelp Fine-grained classification BERT_base+ITPT Error 29.42 # 4

Methods