Korean Language Understanding Evaluation (KLUE) benchmark is a series of datasets to evaluate natural language understanding capability of Korean language models. KLUE consists of 8 diverse and representative tasks, which are accessible to anyone without any restrictions. With ethical considerations in mind, we deliberately design annotation guidelines to obtain unambiguous annotations for all datasets. Furthermore, we build an evaluation system and carefully choose evaluations metrics for every task, thus establishing fair comparison across Korean language models.
19 PAPERS • 1 BENCHMARK
533 parallel examples sampled from TACRED, translated into Russian and Korean (and 3 additional examples in Russian), accompanied with tranlsation of a list of trigger words collected for the different relations.
1 PAPER • NO BENCHMARKS YET