Document Summarization

196 papers with code • 7 benchmarks • 28 datasets

Automatic Document Summarization is the task of rewriting a document into its shorter form while still retaining its important content. The most popular two paradigms are extractive approaches and abstractive approaches. Extractive approaches generate summaries by extracting parts of the original document (usually sentences), while abstractive methods may generate new words or phrases which are not in the original document.

Source: HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization

Benchmarks

Add a Result

These leaderboards are used to track progress in Document Summarization

Dataset	Best Model	Compare
CNN / Daily Mail	Scrambled code + broken (alter)	See all
HowSumm-Step	LexRank (query: step title)	See all
HowSumm-Method	LexRank (query: method + article + steps titles)	See all
BBC XSum	BigBird-Pegasus	See all
Arxiv HEP-TH citation graph	DeepPyramidion	See all
arXiv Summarization Dataset	DeepPyramidion	See all
WikiLingua (tr->en)	DOCmT5	See all

Libraries

Use these libraries to find Document Summarization models and implementations

huggingface/transformers

3 papers

126,108

thudm/swissarmytransformer

2 papers

848

HHousen/TransformerSum

2 papers

425

shashiongithub/XSum

2 papers

344

See all 6 libraries.

Datasets

Subtasks

Email Thread Summarization

Most implemented papers

Most implemented Social Latest No code

Neural Summarization by Extracting Sentences and Words

adrian9631/TextSumma • • ACL 2016

Traditional approaches to extractive summarization rely heavily on human-engineered features.

Paper
Code

Urdu Summary Corpus

humsha/USCorpus • LREC 2016

This paper reports the construction of a benchmark corpus for Urdu summaries (abstracts) to facilitate the development and evaluation of single document summarization systems for Urdu language.

Paper
Code

MDSWriter: Annotation Tool for Creating High-Quality Multi-Document Summarization Corpora

UKPLab/mdswriter • ACL 2016

Paper
Code

The Role of Discourse Units in Near-Extractive Summarization

grimpil/nyt-summ • WS 2016

Paper
Code

Distraction-Based Neural Networks for Document Summarization

lukecq1231/nats • 26 Oct 2016

Distributed representation learned with neural networks has recently shown to be effective in modeling natural languages at fine granularities such as words, phrases, and even sentences.

Paper
Code

A General Optimization Framework for Multi-Document Summarization Using Genetic Algorithms and Swarm Intelligence

UKPLab/coling2016-genetic-swarm-MDS • COLING 2016

Extracting summaries via integer linear programming and submodularity are popular and successful techniques in extractive multi-document summarization.

Paper
Code

Bridging the gap between extractive and abstractive summaries: Creation and evaluation of coherent extracts from heterogeneous sources

AIPHES/DBS • COLING 2016

Coherent extracts are a novel type of summary combining the advantages of manually created abstractive summaries, which are fluent but difficult to evaluate, and low-quality automatically created extractive summaries, which lack coherence and structure.

Paper
Code

The Next Step for Multi-Document Summarization: A Heterogeneous Multi-Genre Corpus Built with a Novel Construction Approach

AIPHES/hMDS • COLING 2016

In a detailed analysis, we show that our new corpus is significantly different from the homogeneous corpora commonly used, and that it is heterogeneous along several dimensions.

Paper
Code