Data-to-Text Generation
107 papers with code • 24 benchmarks • 22 datasets
A classic problem in natural-language generation (NLG) involves taking structured data, such as a table, as input, and producing text that adequately and fluently describes this data as output. Unlike machine translation, which aims for complete transduction of the sentence to be translated, this form of NLG is usually taken to require addressing (at least) two separate challenges: what to say, the selection of an appropriate subset of the input data to discuss, and how to say it, the surface realization of a generation.
( Image credit: Data-to-Text Generation with Content Selection and Planning )
Libraries
Use these libraries to find Data-to-Text Generation models and implementationsMost implemented papers
Transition-Based Deep Input Linearization
Traditional methods for deep NLG adopt pipeline approaches comprising stages such as constructing syntactic input, predicting function words, linearizing the syntactic input and generating the surface forms.
Semantic Noise Matters for Neural Natural Language Generation
Neural natural language generation (NNLG) systems are known for their pathological outputs, i. e. generating text which is unrelated to the input specification.
A Hierarchical Model for Data-to-Text Generation
This however loses most of the structure contained in the data.
Revisiting Challenges in Data-to-Text Generation with Fact Grounding
Data-to-text generation models face challenges in ensuring data fidelity by referring to the correct input source.
Modeling Global and Local Node Contexts for Text Generation from Knowledge Graphs
Recent graph-to-text models generate text from graph-based data using either global or local aggregation to learn node representations.
Variational Template Machine for Data-to-Text Generation
We propose the variational template machine (VTM), a novel method to generate text descriptions from data tables.
Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity
Our generated text has a significantly better semantic fidelity than the state of the art across all four datasets
ToTTo: A Controlled Table-To-Text Generation Dataset
We present ToTTo, an open-domain English table-to-text dataset with over 120, 000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description.
GPT-too: A language-model-first approach for AMR-to-text generation
Meaning Representations (AMRs) are broad-coverage sentence-level semantic graphs.
Partially-Aligned Data-to-Text Generation with Distant Supervision
This kind of data is much easier to obtain since it can be produced automatically.