Document Summarization

195 papers with code • 7 benchmarks • 28 datasets

Automatic Document Summarization is the task of rewriting a document into its shorter form while still retaining its important content. The most popular two paradigms are extractive approaches and abstractive approaches. Extractive approaches generate summaries by extracting parts of the original document (usually sentences), while abstractive methods may generate new words or phrases which are not in the original document.

Source: HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization

Libraries

Use these libraries to find Document Summarization models and implementations

Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator

lumia-group/fouriertransformer 24 May 2023

Many researchers have focused on designing new forms of self-attention or introducing new parameters to overcome this limitation, however a large portion of them prohibits the model to inherit weights from large pretrained models.

19
24 May 2023

Revisiting Sentence Union Generation as a Testbed for Text Consolidation

eranhirs/sentence_union_generation 24 May 2023

In this paper, we suggest revisiting the sentence union generation task as an effective well-defined testbed for assessing text consolidation capabilities, decoupling the consolidation challenge from subjective content selection.

0
24 May 2023

Automated Metrics for Medical Multi-Document Summarization Disagree with Human Evaluations

allenai/mslr-annotated-dataset 23 May 2023

We analyze how automated summarization evaluation metrics correlate with lexical features of generated summaries, to other automated metrics including several we propose in this work, and to aspects of human-assessed summary quality.

5
23 May 2023

A Hierarchical Encoding-Decoding Scheme for Abstractive Multi-document Summarization

damo-nlp-sg/hierencdec 15 May 2023

Pre-trained language models (PLMs) have achieved outstanding achievements in abstractive single-document summarization (SDS).

5
15 May 2023

Summarizing Multiple Documents with Conversational Structure for Meta-Review Generation

oaimli/peersum 2 May 2023

We present PeerSum, a novel dataset for generating meta-reviews of scientific papers.

12
02 May 2023

Enhancing Large Language Model with Self-Controlled Memory Framework

wbbeyourself/scm4llms 26 Apr 2023

Large Language Models (LLMs) are constrained by their inability to process lengthy inputs, resulting in the loss of critical historical information.

33
26 Apr 2023

XWikiGen: Cross-lingual Summarization for Encyclopedic Text Generation in Low Resource Languages

DhavalTaunk08/XWikiGen 22 Mar 2023

But, for low-resource languages, the scarcity of reference articles makes monolingual summarization ineffective in solving this problem.

2
22 Mar 2023

Compressed Heterogeneous Graph for Abstractive Multi-Document Summarization

oaimli/hgsum 12 Mar 2023

We propose HGSUM, an MDS model that extends an encoder-decoder architecture, to incorporate a heterogeneous graph to represent different semantic units (e. g., words and sentences) of the documents.

4
12 Mar 2023

PDSum: Prototype-driven Continuous Summarization of Evolving Multi-document Sets Stream

cliveyn/pdsum 10 Feb 2023

Summarizing text-rich documents has been long studied in the literature, but most of the existing efforts have been made to summarize a static and predefined multi-document set.

3
10 Feb 2023

Generating a Structured Summary of Numerous Academic Papers: Dataset and Method

stevenlau6/bigsurvey 9 Feb 2023

Existing MDS datasets usually focus on producing the structureless summary covering a few input documents.

5
09 Feb 2023