NPHardEval is a dynamic benchmark designed to assess the reasoning abilities of Large Language Models (LLMs) across a broad spectrum of algorithmic questions. Let's delve into the details:
Introducing NPHardEval: To address these limitations, NPHardEval was introduced. It aims to rigorously evaluate LLMs' reasoning abilities by extending up to the NP-Hard complexity class.
Key Features of NPHardEval:
Dynamic Update Mechanism: Unlike static benchmarks, NPHardEval dynamically updates its datapoints on a monthly basis. Regular updates mitigate the risk of overfitting, ensuring a more accurate and reliable assessment of LLMs' reasoning capabilities.
Research Contribution:
In summary, NPHardEval provides a comprehensive evaluation framework for assessing LLMs' reasoning abilities through the lens of computational complexity classes. 🌟
(1) NPHardEval: Dynamic Benchmark on Reasoning Ability of Large Language .... https://arxiv.org/abs/2312.14890. (2) NPHardEval/README.md at main · casmlab/NPHardEval · GitHub. https://github.com/casmlab/NPHardEval/blob/main/README.md. (3) NPHardEval: Benchmarking Reasoning Ability of Large Language Models via .... https://frankling2020.github.io/publication/nphardeval/. (4) undefined. https://doi.org/10.48550/arXiv.2312.14890.
Paper | Code | Results | Date | Stars |
---|