Measuring Massive Multitask Language Understanding

We propose a new test to measure a text model's multitask accuracy. The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more... (read more)

PDF Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK BENCHMARK
Multi-Task Learning Hendrycks Test Random Chance Accuracy (%) 25.0 # 3

Methods used in the Paper