DiscoEval (Discourse Evaluation)

Introduced by Bamberger et al. in DEPTH: Discourse Education through Pre-Training Hierarchically

Dataset Summary

The DiscoEval is an English-language Benchmark that contains a test suite of 7 tasks to evaluate whether sentence representations include semantic information relevant to discourse processing. The benchmark datasets offer a collection of tasks designed to evaluate natural language understanding models in the context of discourse analysis and coherence.

Dataset Sources

Arxiv: A repository of scientific papers and research articles. Wikipedia: An extensive online encyclopedia with articles on diverse topics. Rocstory: A dataset consisting of fictional stories. Ubuntu IRC channel: Conversational data extracted from the Ubuntu Internet Relay Chat (IRC) channel. PeerRead: A dataset of scientific papers frequently used for discourse-related tasks. RST Discourse Treebank: A dataset annotated with Rhetorical Structure Theory (RST) discourse relations. Penn Discourse Treebank: Another dataset with annotated discourse relations, facilitating the study of discourse structure


Paper Code Results Date Stars

Dataset Loaders

No data loaders found. You can submit your data loader here.


Similar Datasets


