GLUE-X

Introduced by Yang et al. in GLUE-X: Evaluating Natural Language Understanding Models from an Out-of-distribution Generalization Perspective

GLUE-X is a benchmark dataset used to evaluate the out-of-distribution (OOD) robustness of Natural Language Understanding (NLU) models. It was created to address the OOD generalization problem, which remains a challenge in many NLP tasks and limits the real-world deployment of these methods. The GLUE-X dataset consists of 14 publicly available datasets used as OOD test data. Evaluations are conducted on 8 classic NLP tasks over popularly used models. The findings from these evaluations highlight the need for improved OOD accuracy in NLP tasks, as significant performance degradation was observed in all settings compared to in-distribution (ID) accuracy. The creators of GLUE-X hope that this dataset will help highlight the importance of OOD robustness and provide insights on how to measure the robustness of a model and how to improve it.

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

GLUE-X

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

SafetyBench

CSCD-IME

CValues

WHYSHIFT

Usage

License

Modalities

Languages

GLUE-X

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

SafetyBench

CSCD-IME

CValues

WHYSHIFT

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages