Document Summarization

195 papers with code • 7 benchmarks • 28 datasets

Automatic Document Summarization is the task of rewriting a document into its shorter form while still retaining its important content. The most popular two paradigms are extractive approaches and abstractive approaches. Extractive approaches generate summaries by extracting parts of the original document (usually sentences), while abstractive methods may generate new words or phrases which are not in the original document.

Source: HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization

Benchmarks

Add a Result

These leaderboards are used to track progress in Document Summarization

Dataset	Best Model	Compare
CNN / Daily Mail	Scrambled code + broken (alter)	See all
HowSumm-Step	LexRank (query: step title)	See all
HowSumm-Method	LexRank (query: method + article + steps titles)	See all
BBC XSum	BigBird-Pegasus	See all
Arxiv HEP-TH citation graph	DeepPyramidion	See all
arXiv Summarization Dataset	DeepPyramidion	See all
WikiLingua (tr->en)	DOCmT5	See all

Libraries

Use these libraries to find Document Summarization models and implementations

huggingface/transformers

3 papers

125,425

thudm/swissarmytransformer

2 papers

843

HHousen/TransformerSum

2 papers

425

shashiongithub/XSum

2 papers

343

See all 6 libraries.

Datasets

Subtasks

Email Thread Summarization

Latest papers

Most implemented Social Latest No code

Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator

lumia-group/fouriertransformer • • 24 May 2023

Many researchers have focused on designing new forms of self-attention or introducing new parameters to overcome this limitation, however a large portion of them prohibits the model to inherit weights from large pretrained models.

24 May 2023

Paper
Code

Revisiting Sentence Union Generation as a Testbed for Text Consolidation

eranhirs/sentence_union_generation • 24 May 2023

In this paper, we suggest revisiting the sentence union generation task as an effective well-defined testbed for assessing text consolidation capabilities, decoupling the consolidation challenge from subjective content selection.

24 May 2023

Paper
Code

Automated Metrics for Medical Multi-Document Summarization Disagree with Human Evaluations

allenai/mslr-annotated-dataset • 23 May 2023

We analyze how automated summarization evaluation metrics correlate with lexical features of generated summaries, to other automated metrics including several we propose in this work, and to aspects of human-assessed summary quality.

23 May 2023

Paper
Code

A Hierarchical Encoding-Decoding Scheme for Abstractive Multi-document Summarization

damo-nlp-sg/hierencdec • • 15 May 2023

Pre-trained language models (PLMs) have achieved outstanding achievements in abstractive single-document summarization (SDS).

15 May 2023

Paper
Code

Summarizing Multiple Documents with Conversational Structure for Meta-Review Generation

oaimli/peersum • • 2 May 2023

We present PeerSum, a novel dataset for generating meta-reviews of scientific papers.

02 May 2023

Paper
Code

Enhancing Large Language Model with Self-Controlled Memory Framework

wbbeyourself/scm4llms • 26 Apr 2023

Large Language Models (LLMs) are constrained by their inability to process lengthy inputs, resulting in the loss of critical historical information.

26 Apr 2023

Paper
Code

XWikiGen: Cross-lingual Summarization for Encyclopedic Text Generation in Low Resource Languages

DhavalTaunk08/XWikiGen • 22 Mar 2023

But, for low-resource languages, the scarcity of reference articles makes monolingual summarization ineffective in solving this problem.

22 Mar 2023

Paper
Code

Compressed Heterogeneous Graph for Abstractive Multi-Document Summarization

oaimli/hgsum • • 12 Mar 2023

We propose HGSUM, an MDS model that extends an encoder-decoder architecture, to incorporate a heterogeneous graph to represent different semantic units (e. g., words and sentences) of the documents.

12 Mar 2023

Paper
Code

PDSum: Prototype-driven Continuous Summarization of Evolving Multi-document Sets Stream

cliveyn/pdsum • • 10 Feb 2023

Summarizing text-rich documents has been long studied in the literature, but most of the existing efforts have been made to summarize a static and predefined multi-document set.

10 Feb 2023

Paper
Code

Generating a Structured Summary of Numerous Academic Papers: Dataset and Method

stevenlau6/bigsurvey • 9 Feb 2023

Existing MDS datasets usually focus on producing the structureless summary covering a few input documents.

09 Feb 2023

Paper
Code

Document Summarization

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result