Search Results for author: Mohammed Alsobay

Found 1 papers, 1 papers with code

The RealHumanEval: Evaluating Large Language Models' Abilities to Support Programmers

1 code implementation • 3 Apr 2024 • Hussein Mozannar, Valerie Chen, Mohammed Alsobay, Subhro Das, Sebastian Zhao, Dennis Wei, Manish Nagireddy, Prasanna Sattigeri, Ameet Talwalkar, David Sontag

Evaluation of large language models (LLMs) for code has primarily relied on static benchmarks, including HumanEval (Chen et al., 2021), which measure the ability of LLMs to generate complete code that passes unit tests.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.