SG-NLG (Schema-Guided Natural Language Generation)

Introduced by Du et al. in Schema-Guided Natural Language Generation

The SG-NLG dataset is a pre-processed version of the DSTC8 Schema-Guided Dialogue SGD dataset, designed specifically for data-to-text Natural Language Generation (NLG). The original DSTC8 SGD contains ~20,000 dialogues spanning across ~20 domains.

This SG-NLG dataset is designed to make it easier to conduct NLG experiments on the SGD data. It consists of pre-processed SGD data by pairing the schema for each system turn with the corresponding set of natural language strings that realize it. It also “delexicalizes” the prompts (replace related values with fixed names) to convert them into templates that make them more generic for use within a dialog system.

The final SG-NLG dataset is composed of nearly 4K MRs and over 140K templates.

Source: The Schema-Guided Natural Language Generation (SG-NLG) Dataset


Paper Code Results Date Stars

Dataset Loaders


Similar Datasets


  • Unknown

