CPM-2

Introduced by Zhang et al. in CPM-2: Large-scale Cost-effective Pre-trained Language Models

CPM-2 is a 11 billion parameters pre-trained language model based on a standard Transformer architecture consisting of a bidirectional encoder and a unidirectional decoder. The model is pre-trained on WuDaoCorpus which contains 2.3TB cleaned Chinese data as well as 300GB cleaned English data. The pre-training process of CPM-2 can be divided into three stages: Chinese pre-training, bilingual pre-training, and MoE pre-training. Multi-stage training with knowledge inheritance can significantly reduce the computation cost.

Source: CPM-2: Large-scale Cost-effective Pre-trained Language Models

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Transformer	Transformers

Categories

Add Remove

Language Models

CPM-2

Papers

Usage Over Time

Components

Categories Edit Add Remove

Categories

Add Remove