The Memetracker corpus contains articles from mainstream media and blogs from August 1 to October 31, 2008 with about 1 million documents per day. It has 10,967 hyperlink cascades among 600 media sites.
37 PAPERS • NO BENCHMARKS YET
BillSum is the first dataset for summarization of US Congressional and California state bills.
36 PAPERS • 3 BENCHMARKS