10 dataset results for Dialogue Generation AND Texts AND English

DialogSum is a large-scale dialogue summarization dataset, consisting of 13,460 dialogues with corresponding manually labeled summaries and topics.

38 PAPERS • 2 BENCHMARKS

Doc2Dial

Doc2Dial (Doc2Dial: Document-grounded Dialogue)

For goal-oriented document-grounded dialogs, it often involves complex contexts for identifying the most relevant information, which requires better understanding of the inter-relations between conversations and documents. Meanwhile, many online user-oriented documents use both semi-structured and unstructured contents for guiding users to access information of different contexts. Thus, we create a new goal-oriented document-grounded dialogue dataset that captures more diverse scenarios derived from various document contents from multiple domains such ssa.gov and studentaid.gov. For data collection, we propose a novel pipeline approach for dialogue data construction, which has been adapted and evaluated for several domains.

34 PAPERS • NO BENCHMARKS YET

MultiDoc2Dial

MultiDoc2Dial (MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents)

MultiDoc2Dial is a new task and dataset on modeling goal-oriented dialogues grounded in multiple documents. Most previous works treat document-grounded dialogue modeling as a machine reading comprehension task based on a single given document or passage. We aim to address more realistic scenarios where a goal-oriented information-seeking conversation involves multiple topics, and hence is grounded on different documents.

20 PAPERS • NO BENCHMARKS YET

ProsocialDialog

Most existing dialogue systems fail to respond properly to potentially unsafe user utterances by either ignoring or passively agreeing with them.

13 PAPERS • 1 BENCHMARK

FaithDial

FaithDial is a new benchmark for hallucination-free dialogues, by editing hallucinated responses in the Wizard of Wikipedia (WoW) benchmark.

12 PAPERS • NO BENCHMARKS YET

FusedChat

FusedChat is an inter-mode dialogue dataset. It contains dialogue sessions fusing task-oriented dialogues (TOD) and open-domain dialogues (ODD). Based on MultiWOZ, FusedChat appends or prepends an ODD to every existing TOD. See more details in the paper.

6 PAPERS • 1 BENCHMARK

OTTers

OTTers is a dataset of human one-turn topic transitions. In this task, models must connect two topics in a cooperative and coherent manner, by generating a "bridging" utterance connecting the new topic tot he topic of the previous conversation turn.

6 PAPERS • NO BENCHMARKS YET

MultiRefKGC (multi-reference KGC)

MultiRefKGC is a dataset created from conversations from Reddit designed for Knowledge-Grounded Dialogue Generation tasks.

1 PAPER • NO BENCHMARKS YET

OpenViDial 2.0

OpenViDial 2.0 is a larger-scale open-domain multi-modal dialogue dataset compared to the previous version OpenViDial 1.0. OpenViDial 2.0 contains a total number of 5.6 million dialogue turns extracted from either movies or TV series from different resources, and each dialogue turn is paired with its corresponding visual context.

1 PAPER • 1 BENCHMARK

Statcan Dialogue Dataset

The StatCan Dialogue Dataset: Retrieving Data Tables through Conversations with Genuine Intents

1 PAPER • 1 BENCHMARK

Datasets

10 dataset results for Dialogue Generation AND Texts AND English