Document Summarization

195 papers with code • 7 benchmarks • 28 datasets

Automatic Document Summarization is the task of rewriting a document into its shorter form while still retaining its important content. The most popular two paradigms are extractive approaches and abstractive approaches. Extractive approaches generate summaries by extracting parts of the original document (usually sentences), while abstractive methods may generate new words or phrases which are not in the original document.

Source: HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization

Benchmarks

Add a Result

These leaderboards are used to track progress in Document Summarization

Dataset	Best Model	Compare
CNN / Daily Mail	Scrambled code + broken (alter)	See all
HowSumm-Step	LexRank (query: step title)	See all
HowSumm-Method	LexRank (query: method + article + steps titles)	See all
BBC XSum	BigBird-Pegasus	See all
Arxiv HEP-TH citation graph	DeepPyramidion	See all
arXiv Summarization Dataset	DeepPyramidion	See all
WikiLingua (tr->en)	DOCmT5	See all

Libraries

Use these libraries to find Document Summarization models and implementations

huggingface/transformers

3 papers

125,167

thudm/swissarmytransformer

2 papers

842

HHousen/TransformerSum

2 papers

425

shashiongithub/XSum

2 papers

340

See all 6 libraries.

Datasets

Subtasks

Email Thread Summarization

Most implemented papers

Most implemented Social Latest No code

Generating (Factual?) Narrative Summaries of RCTs: Experiments with Neural Multi-Document Summarization

allenai/mslr-shared-task • • 25 Aug 2020

We enlist medical professionals to evaluate generated summaries, and we find that modern summarization systems yield consistently fluent and relevant synopses, but that they are not always factual.

Paper
Code

Global-aware Beam Search for Neural Abstractive Summarization

yema2018/global_aware • • NeurIPS 2021

A global scoring mechanism is then developed to regulate beam search to generate summaries in a near-global optimal fashion.

Paper
Code

Quantitative Argument Summarization and Beyond: Cross-Domain Key Point Analysis

ibm/kpa_2021_shared_task • EMNLP 2020

Recent work has proposed to summarize arguments by mapping them to a small set of expert-generated key points, where the salience of each key point corresponds to the number of its matching arguments.

Paper
Code

MS2: Multi-Document Summarization of Medical Studies

allenai/ms2 • • 13 Apr 2021

In support of this goal, we release MS^2 (Multi-Document Summarization of Medical Studies), a dataset of over 470k documents and 20k summaries derived from the scientific literature.

Paper
Code

PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

allenai/primer • • ACL 2022

We introduce PRIMERA, a pre-trained model for multi-document representation with a focus on summarization that reduces the need for dataset-specific architectures and large amounts of fine-tuning labeled data.

Paper
Code

Summ^N: A Multi-Stage Summarization Framework for Long Input Dialogues and Documents

psunlpgroup/summ-n • • ACL 2022

To the best of our knowledge, Summ$^N$ is the first multi-stage split-then-summarize framework for long input summarization.

Paper
Code

Proposition-Level Clustering for Multi-Document Summarization

oriern/procluster • • NAACL 2022

Text clustering methods were traditionally incorporated into multi-document summarization (MDS) as a means for coping with considerable information repetition.

Paper
Code