DialogSum is a large-scale dialogue summarization dataset, consisting of 13,460 dialogues with corresponding manually labeled summaries and topics.
38 PAPERS • 2 BENCHMARKS
For goal-oriented document-grounded dialogs, it often involves complex contexts for identifying the most relevant information, which requires better understanding of the inter-relations between conversations and documents. Meanwhile, many online user-oriented documents use both semi-structured and unstructured contents for guiding users to access information of different contexts. Thus, we create a new goal-oriented document-grounded dialogue dataset that captures more diverse scenarios derived from various document contents from multiple domains such ssa.gov and studentaid.gov. For data collection, we propose a novel pipeline approach for dialogue data construction, which has been adapted and evaluated for several domains.
34 PAPERS • NO BENCHMARKS YET
MultiDoc2Dial is a new task and dataset on modeling goal-oriented dialogues grounded in multiple documents. Most previous works treat document-grounded dialogue modeling as a machine reading comprehension task based on a single given document or passage. We aim to address more realistic scenarios where a goal-oriented information-seeking conversation involves multiple topics, and hence is grounded on different documents.
20 PAPERS • NO BENCHMARKS YET
Most existing dialogue systems fail to respond properly to potentially unsafe user utterances by either ignoring or passively agreeing with them.
13 PAPERS • 1 BENCHMARK
FaithDial is a new benchmark for hallucination-free dialogues, by editing hallucinated responses in the Wizard of Wikipedia (WoW) benchmark.
12 PAPERS • NO BENCHMARKS YET
FusedChat is an inter-mode dialogue dataset. It contains dialogue sessions fusing task-oriented dialogues (TOD) and open-domain dialogues (ODD). Based on MultiWOZ, FusedChat appends or prepends an ODD to every existing TOD. See more details in the paper.
6 PAPERS • 1 BENCHMARK
OTTers is a dataset of human one-turn topic transitions. In this task, models must connect two topics in a cooperative and coherent manner, by generating a "bridging" utterance connecting the new topic tot he topic of the previous conversation turn.
6 PAPERS • NO BENCHMARKS YET
MultiRefKGC is a dataset created from conversations from Reddit designed for Knowledge-Grounded Dialogue Generation tasks.
1 PAPER • NO BENCHMARKS YET
OpenViDial 2.0 is a larger-scale open-domain multi-modal dialogue dataset compared to the previous version OpenViDial 1.0. OpenViDial 2.0 contains a total number of 5.6 million dialogue turns extracted from either movies or TV series from different resources, and each dialogue turn is paired with its corresponding visual context.
1 PAPER • 1 BENCHMARK
The StatCan Dialogue Dataset: Retrieving Data Tables through Conversations with Genuine Intents