ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations

2 Nov 2019Shizhe DiaoJiaxin BaiYan SongTong ZhangYonggang Wang

The pre-training of text encoders normally processes text as a sequence of tokens corresponding to small text units, such as word pieces in English and characters in Chinese. It omits information carried by larger text granularity, and thus the encoders cannot easily adapt to certain combinations of characters... (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT LEADERBOARD
Chinese Sentiment Analysis ChnSentiCorp ZEN (Init with Chinese BERT) F1 96.08 # 1
Chinese Sentiment Analysis ChnSentiCorp ZEN (Random Init) F1 94.42 # 2
Chinese Sentiment Analysis ChnSentiCorp Dev ZEN (Init with Chinese BERT) F1 95.66 # 1
Chinese Sentiment Analysis ChnSentiCorp Dev ZEN (Random Init) F1 94.87 # 2
Chinese Part-of-Speech Tagging CTB5 ZEN (Random Init) F1 95.82 # 3
Chinese Part-of-Speech Tagging CTB5 ZEN (Init with Chinese BERT) F1 96.64 # 1
Chinese Part-of-Speech Tagging CTB5 Dev ZEN (Random Init) F1 96.12 # 2
Chinese Part-of-Speech Tagging CTB5 Dev ZEN (Init with Chinese BERT) F1 97.43 # 1
Chinese Sentence Pair Classification LCQMC ZEN (Init with Chinese BERT) F1 87.95 # 2
Chinese Sentence Pair Classification LCQMC ZEN (Random Init) F1 85.27 # 4
Chinese Sentence Pair Classification LCQMC Dev ZEN (Init with Chinese BERT) F1 90.2 # 2
Chinese Sentence Pair Classification LCQMC Dev ZEN (Random Init) F1 88.1 # 3
Chinese Word Segmentation MSR ZEN (Random Init) F1 97.89 # 3
Chinese Word Segmentation MSR ZEN (Init with Chinese BERT) F1 98.35 # 1
Chinese Named Entity Recognition MSRA ZEN (Random Init) F1 93.24 # 9
Chinese Named Entity Recognition MSRA ZEN (Init with Chinese BERT) F1 95.25 # 5
Chinese Document Classification THUCNews ZEN (Init with Chinese BERT) F1 97.64 # 2
Chinese Document Classification THUCNews ZEN (Random Init) F1 96.87 # 3
Chinese Document Classification THUCNews Dev ZEN (Random Init) F1 97.2 # 3
Chinese Document Classification THUCNews Dev ZEN (Init with Chinese BERT) F1 97.66 # 2
Chinese Sentence Pair Classification XNLI ZEN (Random Init) F1 77.03 # 3
Chinese Sentence Pair Classification XNLI ZEN (Init with Chinese BERT) F1 79.2 # 2
Chinese Sentence Pair Classification XNLI Dev ZEN (Init with Chinese BERT) F1 80.48 # 2
Chinese Sentence Pair Classification XNLI Dev ZEN (Random Init) F1 77.11 # 3