Document Summarization
195 papers with code • 7 benchmarks • 28 datasets
Automatic Document Summarization is the task of rewriting a document into its shorter form while still retaining its important content. The most popular two paradigms are extractive approaches and abstractive approaches. Extractive approaches generate summaries by extracting parts of the original document (usually sentences), while abstractive methods may generate new words or phrases which are not in the original document.
Libraries
Use these libraries to find Document Summarization models and implementationsDatasets
Most implemented papers
Generating (Factual?) Narrative Summaries of RCTs: Experiments with Neural Multi-Document Summarization
We enlist medical professionals to evaluate generated summaries, and we find that modern summarization systems yield consistently fluent and relevant synopses, but that they are not always factual.
Global-aware Beam Search for Neural Abstractive Summarization
A global scoring mechanism is then developed to regulate beam search to generate summaries in a near-global optimal fashion.
Quantitative Argument Summarization and Beyond: Cross-Domain Key Point Analysis
Recent work has proposed to summarize arguments by mapping them to a small set of expert-generated key points, where the salience of each key point corresponds to the number of its matching arguments.
MS2: Multi-Document Summarization of Medical Studies
In support of this goal, we release MS^2 (Multi-Document Summarization of Medical Studies), a dataset of over 470k documents and 20k summaries derived from the scientific literature.
PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization
We introduce PRIMERA, a pre-trained model for multi-document representation with a focus on summarization that reduces the need for dataset-specific architectures and large amounts of fine-tuning labeled data.
Summ^N: A Multi-Stage Summarization Framework for Long Input Dialogues and Documents
To the best of our knowledge, Summ$^N$ is the first multi-stage split-then-summarize framework for long input summarization.
Proposition-Level Clustering for Multi-Document Summarization
Text clustering methods were traditionally incorporated into multi-document summarization (MDS) as a means for coping with considerable information repetition.
Neural Summarization by Extracting Sentences and Words
Traditional approaches to extractive summarization rely heavily on human-engineered features.