Data-to-Text Generation
105 papers with code • 24 benchmarks • 22 datasets
A classic problem in natural-language generation (NLG) involves taking structured data, such as a table, as input, and producing text that adequately and fluently describes this data as output. Unlike machine translation, which aims for complete transduction of the sentence to be translated, this form of NLG is usually taken to require addressing (at least) two separate challenges: what to say, the selection of an appropriate subset of the input data to discuss, and how to say it, the surface realization of a generation.
( Image credit: Data-to-Text Generation with Content Selection and Planning )
Libraries
Use these libraries to find Data-to-Text Generation models and implementationsLatest papers
Bridging the Gap between Different Vocabularies for LLM Ensemble
Ensembling different large language models (LLMs) to unleash their complementary potential and harness their individual strengths is highly valuable.
Prompting for Numerical Sequences: A Case Study on Market Comment Generation
Large language models (LLMs) have been applied to a wide range of data-to-text generation tasks, including tables, graphs, and time-series numerical data-to-text settings.
Triples-to-isiXhosa (T2X): Addressing the Challenges of Low-Resource Agglutinative Data-to-Text Generation
In this paper we tackle data-to-text for isiXhosa, which is low-resource and agglutinative.
High-quality Data-to-Text Generation for Severely Under-Resourced Languages with Out-of-the-box Large Language Models
The performance of NLP methods for severely under-resourced languages cannot currently hope to match the state of the art in NLP methods for well resourced languages.
Self-training from Self-memory in Data-to-text Generation
The quality of self-memory is validated by two models, data-to-text (D2T) and text-to-data (T2D), by two pre-defined conditions: (1) the appearance of all source values in the outputs of the D2T model and (2) the ability to convert back to source data in the outputs in the T2D model.
Unifying Structured Data as Graph for Data-to-Text Pre-Training
In this paper, we unify different types of structured data (i. e., table, key-value data, knowledge graph) into the graph format and cast different data-to-text generation tasks as graph-to-text generation.
TLM: Token-Level Masking for Transformers
Structured dropout approaches, such as attention dropout and DropHead, have been investigated to regularize the multi-head attention mechanism in Transformers.
ASPIRO: Any-shot Structured Parsing-error-Induced ReprOmpting for Consistent Data-to-Text Generation
We present ASPIRO, an approach for structured data verbalisation into short template sentences in zero to few-shot settings.
Critic-Driven Decoding for Mitigating Hallucinations in Data-to-text Generation
Our method does not need any changes to the underlying LM's architecture or training procedure and can thus be combined with any model and decoding operating on word probabilities.
Data-to-text Generation for Severely Under-Resourced Languages with GPT-3.5: A Bit of Help Needed from Google Translate
LLMs like GPT are great at tasks involving English which dominates in their training data.