SOEVAL is created by us by mining questions from StackOverflow. Our goal was to create a prompt dataset that reflects the real-life needs of software developers. To build this dataset, we first collected 500 popular and recent questions with Python and Java tags for each. From these 1,000 questions, we applied a set of inclusion and exclusion criteria. The inclusion criteria were: the question has to (1) explicitly ask “how to do X” in Python or Java; (2) include code in its body; (3) have an accepted answer that includes code. We excluded questions that were (1) open-ended and asking for best practices/guidelines for a specific problem in Python/Java; (2) related to finding a specific API/module for a given task; (3) related to errors due to environment configuration (e.g., missing dependency library); (4) related to configuring libraries/API; (5) syntax specific types of questions. By applying the criteria above to these 1K questions, we obtained 28 and 42 prompts for Java and Python, respectively.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages