Pre-Training with Whole Word Masking for Chinese BERT

19 Jun 2019  ·  Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Ziqing Yang ·

Bidirectional Encoder Representations from Transformers (BERT) has shown marvelous improvements across various NLP tasks, and its consecutive variants have been proposed to further improve the performance of the pre-trained language models. In this paper, we aim to first introduce the whole word masking (wwm) strategy for Chinese BERT, along with a series of Chinese pre-trained language models. Then we also propose a simple but effective model called MacBERT, which improves upon RoBERTa in several ways. Especially, we propose a new masking strategy called MLM as correction (Mac). To demonstrate the effectiveness of these models, we create a series of Chinese pre-trained language models as our baselines, including BERT, RoBERTa, ELECTRA, RBT, etc. We carried out extensive experiments on ten Chinese NLP tasks to evaluate the created Chinese pre-trained language models as well as the proposed MacBERT. Experimental results show that MacBERT could achieve state-of-the-art performances on many NLP tasks, and we also ablate details with several findings that may help future research. We open-source our pre-trained language models for further facilitating our research community. Resources are available: https://github.com/ymcui/Chinese-BERT-wwm

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Chinese Sentence Pair Classification BQ RoBERTa-wwm-ext-large F1 85.8 # 1
Chinese Sentence Pair Classification BQ Dev RoBERTa-wwm-ext-large F1 86.3 # 1
Sentiment Analysis ChnSentiCorp RoBERTa-wwm-ext-large F1 95.8 # 1
Sentiment Analysis ChnSentiCorp Dev RoBERTa-wwm-ext-large F1 95.8 # 1
Chinese Reading Comprehension CJRC RoBERTa-wwm-ext-large EM 62.4 # 1
F1 82.20 # 1
Chinese Reading Comprehension CJRC Dev RoBERTa-wwm-ext-large EM 62.1 # 1
F1 82.4 # 1
Chinese Reading Comprehension CMRC 2018 (Chinese Machine Reading Comprehension 2018) RoBERTa-wwm-ext (single model) TEST-EM 72.600 # 21
TEST-F1 89.400 # 18
CHALLENGE-EM 26.200 # 15
CHALLENGE-F1 51.000 # 17
Chinese Reading Comprehension CMRC 2018 (Chinese Machine Reading Comprehension 2018) BERT-wwm-ext (single model) TEST-EM 71.400 # 22
TEST-F1 87.700 # 24
CHALLENGE-EM 24.000 # 20
CHALLENGE-F1 47.300 # 21
Chinese Reading Comprehension CMRC 2018 (Chinese Machine Reading Comprehension 2018) RoBERTa-wwm-ext-large (single model) TEST-EM 74.198 # 12
TEST-F1 90.604 # 10
CHALLENGE-EM 31.548 # 5
CHALLENGE-F1 60.074 # 5
Chinese Reading Comprehension CMRC 2018 (Chinese Machine Reading Comprehension 2018) BERT-wwm (single model) TEST-EM 70.500 # 24
TEST-F1 87.400 # 25
CHALLENGE-EM 21.000 # 25
CHALLENGE-F1 47.000 # 22
Chinese Reading Comprehension CMRC 2018 (Simplified Chinese) RoBERTa-wwm-ext-large EM 74.2 # 1
F1 90.6 # 1
Chinese Reading Comprehension CMRC 2018 (Simplified Chinese) Challenge RoBERTa-wwm-ext-large EM 31.5 # 1
F1 60.1 # 1
Chinese Reading Comprehension CMRC 2018 (Simplified Chinese) Dev RoBERTa-wwm-ext-large EM 68.5 # 2
F1 88.4 # 1
Chinese Reading Comprehension DRCD (Traditional Chinese) RoBERTa-wwm-ext-large EM 89.6 # 1
F1 94.5 # 1
Chinese Reading Comprehension DRCD (Traditional Chinese) Dev RoBERTa-wwm-ext-large EM 89.6 # 2
F1 94.8 # 1
Chinese Sentence Pair Classification LCQMC RoBERTa-wwm-ext-large F1 87 # 3
Chinese Sentence Pair Classification LCQMC Dev RoBERTa-wwm-ext-large F1 90.4 # 1
Chinese Document Classification THUCNews RoBERTa-wwm-ext-large F1 97.8 # 1
Chinese Document Classification THUCNews Dev RoBERTa-wwm-ext-large F1 98.3 # 1
Chinese Sentence Pair Classification XNLI RoBERTa-wwm-ext-large F1 81.2 # 1
Chinese Sentence Pair Classification XNLI Dev RoBERTa-wwm-ext-large F1 82.1 # 1

Methods