MMCU (Measuring Massive Multitask Chinese Understanding)

Introduced by Zeng in Measuring Massive Multitask Chinese Understanding

We propose a test to measure the multitask accuracy of large Chinese language models. We constructed a large-scale, multi-task test consisting of single and multiple-choice questions from various branches of knowledge. The test encompasses the fields of medicine, law, psychology, and education, with medicine divided into 15 sub-tasks and education into 8 sub-tasks. The questions in the dataset were manually collected by professionals from freely available online resources, including university medical examinations, national unified legal professional qualification examinations, psychological counselor exams, graduate entrance examinations for psychology majors, and the Chinese National College Entrance Examination. In total, we collected 11,900 questions, which we divided into a few-shot development set and a test set. The few-shot development set contains 5 questions per topic, amounting to 55 questions in total. The test set comprises 11,845 questions.

Homepage