Document Summarization

195 papers with code • 7 benchmarks • 28 datasets

Automatic Document Summarization is the task of rewriting a document into its shorter form while still retaining its important content. The most popular two paradigms are extractive approaches and abstractive approaches. Extractive approaches generate summaries by extracting parts of the original document (usually sentences), while abstractive methods may generate new words or phrases which are not in the original document.

Source: HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization

Benchmarks

Add a Result

These leaderboards are used to track progress in Document Summarization

Dataset	Best Model	Compare
CNN / Daily Mail	Scrambled code + broken (alter)	See all
HowSumm-Step	LexRank (query: step title)	See all
HowSumm-Method	LexRank (query: method + article + steps titles)	See all
BBC XSum	BigBird-Pegasus	See all
Arxiv HEP-TH citation graph	DeepPyramidion	See all
arXiv Summarization Dataset	DeepPyramidion	See all
WikiLingua (tr->en)	DOCmT5	See all

Libraries

Use these libraries to find Document Summarization models and implementations

huggingface/transformers

3 papers

125,334

thudm/swissarmytransformer

2 papers

842

HHousen/TransformerSum

2 papers

425

shashiongithub/XSum

2 papers

342

See all 6 libraries.

Datasets

Subtasks

Email Thread Summarization

Most implemented papers

Most implemented Social Latest No code

Scoring Sentence Singletons and Pairs for Abstractive Summarization

ucfnlp/summarization-sing-pair-mix • • ACL 2019

There is thus a crucial gap between sentence selection and fusion to support summarizing by both compressing single sentences and fusing pairs.

Paper
Code

AREDSUM: Adaptive Redundancy-Aware Iterative Sentence Ranking for Extractive Document Summarization

kepingbi/ARedSumSentRank • • EACL 2021

Redundancy-aware extractive summarization systems score the redundancy of the sentences to be included in a summary either jointly with their salience information or separately as an additional sentence scoring step.

Paper
Code

DebateSum: A large-scale argument mining and summarization dataset

Hellisotherpeople/DebateSum • COLING (ArgMining) 2020

Finally, we present a search engine for this dataset which is utilized extensively by members of the National Speech and Debate Association today.

Paper
Code

Centroid-based Text Summarization through Compositionality of Word Embeddings

gaetangate/text-summarizer • WS 2017

The textual similarity is a crucial aspect for many extractive text summarization methods.

Paper
Code

TAP-DLND 1.0 : A Corpus for Document Level Novelty Detection

edithal-14/A-Deep-Neural-Solution-To-Document-Level-Novelty-Detection-COLING-2018- • • LREC 2018

Detecting novelty of an entire document is an Artificial Intelligence (AI) frontier problem that has widespread NLP applications, such as extractive document summarization, tracking development of news events, predicting impact of scholarly articles, etc.

Paper
Code

Extractive Summarization as Text Matching

maszhongming/MatchSum • • ACL 2020

This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.

Paper
Code

Screenplay Summarization Using Latent Narrative Structure

EdinburghNLP/csi-corpus • ACL 2020

Most general-purpose extractive summarization models are trained on news articles, which are short and present all important information upfront.

Paper
Code

On Faithfulness and Factuality in Abstractive Summarization

google-research-datasets/xsum_hallucination_annotations • ACL 2020

It is well known that the standard likelihood training and approximate decoding objectives in neural text generation models lead to less human-like responses for open-ended tasks such as language modeling and story generation.

Paper
Code

Leveraging Graph to Improve Abstractive Multi-Document Summarization

PaddlePaddle/Research • • ACL 2020

Graphs that capture relations between textual units have great benefits for detecting salient information from multiple documents and generating overall coherent summaries.

Paper
Code

Pre-training via Paraphrasing

lucidrains/marge-pytorch • • NeurIPS 2020

The objective noisily captures aspects of paraphrase, translation, multi-document summarization, and information retrieval, allowing for strong zero-shot performance on several tasks.

Paper
Code

Document Summarization

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result