Data-to-Text Generation

105 papers with code • 24 benchmarks • 22 datasets

A classic problem in natural-language generation (NLG) involves taking structured data, such as a table, as input, and producing text that adequately and fluently describes this data as output. Unlike machine translation, which aims for complete transduction of the sentence to be translated, this form of NLG is usually taken to require addressing (at least) two separate challenges: what to say, the selection of an appropriate subset of the input data to discuss, and how to say it, the surface realization of a generation.

( Image credit: Data-to-Text Generation with Content Selection and Planning )

Libraries

Use these libraries to find Data-to-Text Generation models and implementations
2 papers
204

Bridging the Gap between Different Vocabularies for LLM Ensemble

xydaytoy/eva 15 Apr 2024

Ensembling different large language models (LLMs) to unleash their complementary potential and harness their individual strengths is highly valuable.

1
15 Apr 2024

Prompting for Numerical Sequences: A Case Study on Market Comment Generation

aistairc/market-reporter 3 Apr 2024

Large language models (LLMs) have been applied to a wide range of data-to-text generation tasks, including tables, graphs, and time-series numerical data-to-text settings.

65
03 Apr 2024

Triples-to-isiXhosa (T2X): Addressing the Challenges of Low-Resource Agglutinative Data-to-Text Generation

francois-meyer/t2x 12 Mar 2024

In this paper we tackle data-to-text for isiXhosa, which is low-resource and agglutinative.

1
12 Mar 2024

High-quality Data-to-Text Generation for Severely Under-Resourced Languages with Out-of-the-box Large Language Models

michelalorandi/d2t-gen-for-under-res-lang-w-llms 19 Feb 2024

The performance of NLP methods for severely under-resourced languages cannot currently hope to match the state of the art in NLP methods for well resourced languages.

0
19 Feb 2024

Self-training from Self-memory in Data-to-text Generation

hoangthangta/stsm 19 Jan 2024

The quality of self-memory is validated by two models, data-to-text (D2T) and text-to-data (T2D), by two pre-defined conditions: (1) the appearance of all source values in the outputs of the D2T model and (2) the ability to convert back to source data in the outputs in the T2D model.

0
19 Jan 2024

Unifying Structured Data as Graph for Data-to-Text Pre-Training

alibabaresearch/damo-convai 2 Jan 2024

In this paper, we unify different types of structured data (i. e., table, key-value data, knowledge graph) into the graph format and cast different data-to-text generation tasks as graph-to-text generation.

958
02 Jan 2024

TLM: Token-Level Masking for Transformers

young1993/tlm 28 Oct 2023

Structured dropout approaches, such as attention dropout and DropHead, have been investigated to regularize the multi-head attention mechanism in Transformers.

5
28 Oct 2023

ASPIRO: Any-shot Structured Parsing-error-Induced ReprOmpting for Consistent Data-to-Text Generation

vejvarm/aspiro 27 Oct 2023

We present ASPIRO, an approach for structured data verbalisation into short template sentences in zero to few-shot settings.

0
27 Oct 2023

Critic-Driven Decoding for Mitigating Hallucinations in Data-to-text Generation

langus0/critic-aware-decoding 25 Oct 2023

Our method does not need any changes to the underlying LM's architecture or training procedure and can thus be combined with any model and decoding operating on word probabilities.

5
25 Oct 2023

Data-to-text Generation for Severely Under-Resourced Languages with GPT-3.5: A Bit of Help Needed from Google Translate

dcu-nlg/dcu-nlg-pbn 19 Aug 2023

LLMs like GPT are great at tasks involving English which dominates in their training data.

2
19 Aug 2023