60k Stack Overflow Questions (60k Stack Overflow Questions from 2016-2020 classified into three categories based on their quality)

Introduced by Annamoradnejad et al. in Multi-View Approach to Suggest Moderation Actions in Community Question Answering Sites

The dataset contains 60,000 Stack Overflow questions from 2016-2020, classified into three categories:

  1. HQ: High-quality posts without a single edit.
  2. LQ_EDIT: Low-quality posts with a negative score, and multiple community edits. However, they still remain open after those changes.
  3. LQ_CLOSE: Low-quality posts that were closed by the community without a single edit.

Notes

  • Questions are sorted according to Question Id.
  • Question body is in HTML format.
  • All dates are in UTC format.
  • The dataset is also accessible at https://www.kaggle.com/imoore/60k-stack-overflow-questions-with-quality-rate

How to cite

This is an original dataset, published under MIT License. Please cite the dataset for your usage as the following:

@article{annamoradnejad2022multiview,
  title={Multi-View Approach to Suggest Moderation Actions in Community Question Answering Sites},
  author={Annamoradnejad, Issa and Habibi, Jafar and Fazli, Mohammadamin},
  journal = {Information Sciences},
  volume = {600},
  pages = {144-154},
  year = {2022},
  issn = {0020-0255},
  doi = {https://doi.org/10.1016/j.ins.2022.03.085},
  url = {https://www.sciencedirect.com/science/article/pii/S0020025522003127}
}

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


License


Modalities


Languages