Language Models

CPM-2 is a 11 billion parameters pre-trained language model based on a standard Transformer architecture consisting of a bidirectional encoder and a unidirectional decoder. The model is pre-trained on WuDaoCorpus which contains 2.3TB cleaned Chinese data as well as 300GB cleaned English data. The pre-training process of CPM-2 can be divided into three stages: Chinese pre-training, bilingual pre-training, and MoE pre-training. Multi-stage training with knowledge inheritance can significantly reduce the computation cost.

Source: CPM-2: Large-scale Cost-effective Pre-trained Language Models

Papers


Paper Code Results Date Stars

Components


Component Type
Transformer
Transformers

Categories