GLUE-X is a benchmark dataset used to evaluate the out-of-distribution (OOD) robustness of Natural Language Understanding (NLU) models. It was created to address the OOD generalization problem, which remains a challenge in many NLP tasks and limits the real-world deployment of these methods. The GLUE-X dataset consists of 14 publicly available datasets used as OOD test data. Evaluations are conducted on 8 classic NLP tasks over popularly used models. The findings from these evaluations highlight the need for improved OOD accuracy in NLP tasks, as significant performance degradation was observed in all settings compared to in-distribution (ID) accuracy. The creators of GLUE-X hope that this dataset will help highlight the importance of OOD robustness and provide insights on how to measure the robustness of a model and how to improve it.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages