The Reddit dataset is a graph dataset from Reddit posts made in the month of September, 2014. The node label in this case is the community, or “subreddit”, that a post belongs to. 50 large communities have been sampled to build a post-to-post graph, connecting posts if the same user comments on both. In total this dataset contains 232,965 posts with an average degree of 492. The first 20 days are used for training and the remaining days for testing (with 30% used for validation). For features, off-the-shelf 300-dimensional GloVe CommonCrawl word vectors are used.
603 PAPERS • 13 BENCHMARKS
MIcrosoft News Dataset (MIND) is a large-scale dataset for news recommendation research. It was collected from anonymized behavior logs of Microsoft News website. The mission of MIND is to serve as a benchmark dataset for news recommendation and facilitate the research in news recommendation and recommender systems area.
130 PAPERS • 1 BENCHMARK
A dataset consisting of recipient 46 users and, 26180 tweets. The dataset includes the news feed of the users and 13 features that may influence the relevance of the tweets.
2 PAPERS • NO BENCHMARKS YET
Dataset with articles posted in the r/Liberal and r/Conservative subreddits. In total, we collected a corpus of 226,010 articles. We have collected news articles to understand political expression through the shared news articles.
1 PAPER • 1 BENCHMARK
V-MIND enhanced the MIND dataset with news pictures.
1 PAPER • NO BENCHMARKS YET