Data-to-Text Generation

107 papers with code • 24 benchmarks • 22 datasets

A classic problem in natural-language generation (NLG) involves taking structured data, such as a table, as input, and producing text that adequately and fluently describes this data as output. Unlike machine translation, which aims for complete transduction of the sentence to be translated, this form of NLG is usually taken to require addressing (at least) two separate challenges: what to say, the selection of an appropriate subset of the input data to discuss, and how to say it, the surface realization of a generation.

( Image credit: Data-to-Text Generation with Content Selection and Planning )

Benchmarks

Add a Result

These leaderboards are used to track progress in Data-to-Text Generation

Dataset	Best Model	Compare
WebNLG	Control Prefixes (A1, T5-large)	See all
E2E NLG Challenge	S_1^R	See all
WebNLG Full	Control Prefixes (A1, A2, T5-large)	See all
Cleaned E2E NLG Challenge	Control Prefixes (T5-large)	See all
RotoWire (Relation Generation)	SeqPlan	See all
RotoWire	HierarchicalEncoder + NR + IR	See all
ToTTo	T5-3B	See all
XAlign	Fact-aware embedding with mT5	See all
Rotowire (Content Selection)	Hierarchical Transformer Encoder + conditional copy	See all
RotoWire (Content Ordering)	Hierarchical Transformer Encoder + conditional copy	See all
MULTIWOZ 2.1	T5-Base	See all
MLB Dataset (Relation Generation)	SeqPlan	See all
MLB Dataset	SeqPlan	See all
MLB Dataset (Content Ordering)	SeqPlan	See all
Czech Restaurant NLG	binmt	See all
MLB Dataset (Content Selection)	Force-Copy	See all
SR11Deep	Transition based Deep Input Linearization	See all
ViGGO	DataTuner_FC	See all
WebNLG en	mBART	See all
WebNLG ru	mBART	See all
E2E	self-mem + new data (random)	See all
AMR3.0	StructAdapt	See all
Wikipedia Person and Animal Dataset	Ours	See all
DART	self-mem + new data	See all

Show all 24 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Data-to-Text Generation models and implementations

UFAL-DSG/tgen

2 papers

204

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

Transition-Based Deep Input Linearization

SUTDNLP/ZGen • EACL 2017

Traditional methods for deep NLG adopt pipeline approaches comprising stages such as constructing syntactic input, predicting function words, linearizing the syntactic input and generating the surface forms.

Paper
Code

Semantic Noise Matters for Neural Natural Language Generation

tuetschek/e2e-cleaning • WS 2019

Neural natural language generation (NNLG) systems are known for their pathological outputs, i. e. generating text which is unrelated to the input specification.

Paper
Code

A Hierarchical Model for Data-to-Text Generation

KaijuML/data-to-text-hierarchical • • 20 Dec 2019

This however loses most of the structure contained in the data.

Paper
Code

Revisiting Challenges in Data-to-Text Generation with Fact Grounding

wanghm92/rw_fg • • WS 2019

Data-to-text generation models face challenges in ensuring data fidelity by referring to the correct input source.

Paper
Code

Modeling Global and Local Node Contexts for Text Generation from Knowledge Graphs

UKPLab/kg2text • • 29 Jan 2020

Recent graph-to-text models generate text from graph-based data using either global or local aggregation to learn node representations.

Paper
Code

Variational Template Machine for Data-to-Text Generation

ReneeYe/VariationalTemplateMachine • • ICLR 2020

We propose the variational template machine (VTM), a novel method to generate text descriptions from data tables.

Paper
Code

Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity

amazon-research/datatuner • • COLING 2020

Our generated text has a significantly better semantic fidelity than the state of the art across all four datasets

Paper
Code

ToTTo: A Controlled Table-To-Text Generation Dataset

google-research-datasets/ToTTo • EMNLP 2020

We present ToTTo, an open-domain English table-to-text dataset with over 120, 000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description.

Paper
Code