This dataset tests the capabilities of language models to correctly capture the meaning of words denoting probabilities (WEP), e.g. words like "probably", "maybe", "surely", "impossible".
We used probabilitic soft logic to combine probabilistic statements expressed with WEP (WEP-Reasoning) and we also used the UNLI dataset (https://nlp.jhu.edu/unli/) to directly check whether models can detect the WEP matching human-annotated probabilities. The dataset can be used as natural langauge inference data (context, premise, label) or multiple choice question answering (context,valid_hypothesis, invalid_hypothesis).
Paper | Code | Results | Date | Stars |
---|