Data-to-Text Generation

105 papers with code • 24 benchmarks • 22 datasets

A classic problem in natural-language generation (NLG) involves taking structured data, such as a table, as input, and producing text that adequately and fluently describes this data as output. Unlike machine translation, which aims for complete transduction of the sentence to be translated, this form of NLG is usually taken to require addressing (at least) two separate challenges: what to say, the selection of an appropriate subset of the input data to discuss, and how to say it, the surface realization of a generation.

( Image credit: Data-to-Text Generation with Content Selection and Planning )

Benchmarks

Add a Result

These leaderboards are used to track progress in Data-to-Text Generation

Dataset	Best Model	Compare
WebNLG	Control Prefixes (A1, T5-large)	See all
E2E NLG Challenge	S_1^R	See all
WebNLG Full	Control Prefixes (A1, A2, T5-large)	See all
Cleaned E2E NLG Challenge	Control Prefixes (T5-large)	See all
RotoWire (Relation Generation)	SeqPlan	See all
RotoWire	HierarchicalEncoder + NR + IR	See all
ToTTo	T5-3B	See all
XAlign	Fact-aware embedding with mT5	See all
Rotowire (Content Selection)	Hierarchical Transformer Encoder + conditional copy	See all
RotoWire (Content Ordering)	Hierarchical Transformer Encoder + conditional copy	See all
MULTIWOZ 2.1	T5-Base	See all
MLB Dataset (Relation Generation)	SeqPlan	See all
MLB Dataset	SeqPlan	See all
MLB Dataset (Content Ordering)	SeqPlan	See all
Czech Restaurant NLG	binmt	See all
MLB Dataset (Content Selection)	Force-Copy	See all
SR11Deep	Transition based Deep Input Linearization	See all
ViGGO	DataTuner_FC	See all
WebNLG en	mBART	See all
WebNLG ru	mBART	See all
E2E	self-mem + new data (random)	See all
AMR3.0	StructAdapt	See all
Wikipedia Person and Animal Dataset	Ours	See all
DART	self-mem + new data	See all

Show all 24 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Data-to-Text Generation models and implementations

UFAL-DSG/tgen

2 papers

204

Datasets

Subtasks

Latest papers

Most implemented Social Latest No code

Bridging the Gap between Different Vocabularies for LLM Ensemble

xydaytoy/eva • • 15 Apr 2024

Ensembling different large language models (LLMs) to unleash their complementary potential and harness their individual strengths is highly valuable.

15 Apr 2024

Paper
Code

Prompting for Numerical Sequences: A Case Study on Market Comment Generation

aistairc/market-reporter • • 3 Apr 2024

Large language models (LLMs) have been applied to a wide range of data-to-text generation tasks, including tables, graphs, and time-series numerical data-to-text settings.

03 Apr 2024

Paper
Code

Triples-to-isiXhosa (T2X): Addressing the Challenges of Low-Resource Agglutinative Data-to-Text Generation

francois-meyer/t2x • 12 Mar 2024

In this paper we tackle data-to-text for isiXhosa, which is low-resource and agglutinative.

12 Mar 2024

Paper
Code

High-quality Data-to-Text Generation for Severely Under-Resourced Languages with Out-of-the-box Large Language Models

michelalorandi/d2t-gen-for-under-res-lang-w-llms • 19 Feb 2024

The performance of NLP methods for severely under-resourced languages cannot currently hope to match the state of the art in NLP methods for well resourced languages.

19 Feb 2024

Paper
Code

Self-training from Self-memory in Data-to-text Generation

hoangthangta/stsm • • 19 Jan 2024

The quality of self-memory is validated by two models, data-to-text (D2T) and text-to-data (T2D), by two pre-defined conditions: (1) the appearance of all source values in the outputs of the D2T model and (2) the ability to convert back to source data in the outputs in the T2D model.

19 Jan 2024

Paper
Code

Unifying Structured Data as Graph for Data-to-Text Pre-Training

alibabaresearch/damo-convai • • 2 Jan 2024

In this paper, we unify different types of structured data (i. e., table, key-value data, knowledge graph) into the graph format and cast different data-to-text generation tasks as graph-to-text generation.

958

02 Jan 2024

Paper
Code

TLM: Token-Level Masking for Transformers

young1993/tlm • • 28 Oct 2023

Structured dropout approaches, such as attention dropout and DropHead, have been investigated to regularize the multi-head attention mechanism in Transformers.

28 Oct 2023

Paper
Code

ASPIRO: Any-shot Structured Parsing-error-Induced ReprOmpting for Consistent Data-to-Text Generation

vejvarm/aspiro • 27 Oct 2023

We present ASPIRO, an approach for structured data verbalisation into short template sentences in zero to few-shot settings.

27 Oct 2023

Paper
Code

Critic-Driven Decoding for Mitigating Hallucinations in Data-to-text Generation

langus0/critic-aware-decoding • • 25 Oct 2023

Our method does not need any changes to the underlying LM's architecture or training procedure and can thus be combined with any model and decoding operating on word probabilities.

25 Oct 2023

Paper
Code

Data-to-text Generation for Severely Under-Resourced Languages with GPT-3.5: A Bit of Help Needed from Google Translate

dcu-nlg/dcu-nlg-pbn • 19 Aug 2023

LLMs like GPT are great at tasks involving English which dominates in their training data.

19 Aug 2023

Paper
Code

Data-to-Text Generation

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result