Data-to-Text Generation

104 papers with code • 24 benchmarks • 22 datasets

A classic problem in natural-language generation (NLG) involves taking structured data, such as a table, as input, and producing text that adequately and fluently describes this data as output. Unlike machine translation, which aims for complete transduction of the sentence to be translated, this form of NLG is usually taken to require addressing (at least) two separate challenges: what to say, the selection of an appropriate subset of the input data to discuss, and how to say it, the surface realization of a generation.

( Image credit: Data-to-Text Generation with Content Selection and Planning )

Libraries

Use these libraries to find Data-to-Text Generation models and implementations
2 papers
204

Most implemented papers

Language Models are Unsupervised Multitask Learners

openai/gpt-2 Preprint 2019

Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typically approached with supervised learning on taskspecific datasets.

Challenges in Data-to-Document Generation

harvardnlp/data2text EMNLP 2017

Recent neural models have shown significant progress on the problem of generating short descriptive texts conditioned on a small number of database records.

MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance

AIPHES/emnlp19-moverscore IJCNLP 2019

A robust evaluation metric has a profound impact on the development of text generation systems.

Investigating Pretrained Language Models for Graph-to-Text Generation

UKPLab/plms-graph2text EMNLP (NLP4ConvAI) 2021

We show that the PLMs BART and T5 achieve new state-of-the-art results and that task-adaptive pretraining strategies improve their performance even further.

The E2E Dataset: New Challenges For End-to-End Generation

UFAL-DSG/tgen WS 2017

This paper describes the E2E data, a new dataset for training end-to-end, data-driven natural language generation systems in the restaurant domain, which is ten times bigger than existing, frequently used datasets in this area.

Oversampling for Imbalanced Learning Based on K-Means and SMOTE

felix-last/kmeans_smote 2 Nov 2017

Learning from class-imbalanced data continues to be a common and challenging problem in supervised learning as standard classification algorithms are designed to handle balanced class distributions.

Data-to-Text Generation with Content Selection and Planning

ratishsp/data2text-plan-py 3 Sep 2018

Recent advances in data-to-text generation have led to the use of large-scale datasets and neural network models which are trained end-to-end, without explicitly modeling what to say and in what order.

Deep Graph Convolutional Encoders for Structured Data to Text Generation

diegma/graph-2-text WS 2018

Most previous work on neural text generation from graph-structured data relies on standard sequence-to-sequence methods.

Handling Rare Items in Data-to-Text Generation

shimorina/webnlg-dataset WS 2018

Neural approaches to data-to-text generation generally handle rare input items using either delexicalisation or a copy mechanism.

Pragmatically Informative Text Generation

sIncerass/prag_generation NAACL 2019

We improve the informativeness of models for conditional text generation using techniques from computational pragmatics.