In Search of the Long-Tail: Systematic Generation of Long-Tail Inferential Knowledge via Logical Rule Guided Search

State-of-the-art LLMs outperform humans on reasoning tasks such as Natural Language Inference. Recent works evaluating LLMs note a marked performance drop on input data from the low-probability distribution, i.e., the longtail. Therefore, we focus on systematically generating statements involving long-tail inferential knowledge for more effective evaluation of LLMs in the reasoning space. We first propose a novel framework Logic-Induced- Knowledge-Search (LINK) that generates factually correct and long-tail knowledge statements grounded on symbolic rule templates; LINK effectively generates data in the longtail distribution that zero-shot prompted LLMs are unable to reach, and outperforms zero-shot GPT4 on factual correctness by 5%. We further use the data generated by LINK to construct a dataset Logic-Induced-Long-Tail (LINT) that can be used to evaluate downstream models on the long-tail distribution; LINT contains 108K knowledge statements spanning four domains. We use LINT to test LLMs on an entailment classification task and find that model performances drop by as high as 5% in the long-tail distribution compared to head distribution. Our work shows the utility of evaluating models in the long-tail distribution, and calls for more research on generating evaluation data in the long-tail distribution.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods