1 code implementation • 28 Mar 2024 • Chunqiu Steven Xia, Yinlin Deng, Lingming Zhang
Such limitations inevitably lead us to inquire: Is the leaderboard performance on existing benchmarks reliable and comprehensive enough to measure the program synthesis ability of LLMs?
1 code implementation • 1 Sep 2023 • Yuxiang Wei, Chunqiu Steven Xia, Lingming Zhang
Therefore, we propose Repilot, a general code generation framework to further copilot the AI "copilots" (i. e., LLMs) by synthesizing more valid patches during the repair process.
1 code implementation • 9 Aug 2023 • Chunqiu Steven Xia, Matteo Paltenghi, Jia Le Tian, Michael Pradel, Lingming Zhang
Moreover, the inputs generated by existing fuzzers are often limited to specific features of the input language, and thus can hardly reveal bugs related to other or new features.
1 code implementation • NeurIPS 2023 • Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, Lingming Zhang
While EvalPlus is general, we extend the test-cases of the popular HumanEval benchmark by 80x to build HumanEval+.
no code implementations • 1 Apr 2023 • Chunqiu Steven Xia, Lingming Zhang
For earlier patches that failed to pass all tests, we combine the incorrect patches with their corresponding relevant test failure information to construct a new prompt for the LLM to generate the next patch.
no code implementations • 18 Mar 2023 • Chunqiu Steven Xia, Yifeng Ding, Lingming Zhang
Traditional APR tools have largely leveraged the plastic surgery hypothesis by designing manual or heuristic-based approaches to exploit such existing code ingredients.
no code implementations • 30 Jan 2023 • Chunqiu Steven Xia, Lingming Zhang
As such, we leverage the long-term context window of LLMs to not only avoid generating previously incorrect patches but also incorporate validation feedback to help the model understand the semantic meaning of the program under test.