Dataset Summary
The DiscoEval is an English-language Benchmark that contains a test suite of 7 tasks to evaluate whether sentence representations include semantic information relevant to discourse processing. The benchmark datasets offer a collection of tasks designed to evaluate natural language understanding models in the context of discourse analysis and coherence.
Dataset Sources
Arxiv: A repository of scientific papers and research articles. Wikipedia: An extensive online encyclopedia with articles on diverse topics. Rocstory: A dataset consisting of fictional stories. Ubuntu IRC channel: Conversational data extracted from the Ubuntu Internet Relay Chat (IRC) channel. PeerRead: A dataset of scientific papers frequently used for discourse-related tasks. RST Discourse Treebank: A dataset annotated with Rhetorical Structure Theory (RST) discourse relations. Penn Discourse Treebank: Another dataset with annotated discourse relations, facilitating the study of discourse structure
Paper | Code | Results | Date | Stars |
---|