Document Summarization
195 papers with code • 7 benchmarks • 28 datasets
Automatic Document Summarization is the task of rewriting a document into its shorter form while still retaining its important content. The most popular two paradigms are extractive approaches and abstractive approaches. Extractive approaches generate summaries by extracting parts of the original document (usually sentences), while abstractive methods may generate new words or phrases which are not in the original document.
Libraries
Use these libraries to find Document Summarization models and implementationsDatasets
Latest papers
Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator
Many researchers have focused on designing new forms of self-attention or introducing new parameters to overcome this limitation, however a large portion of them prohibits the model to inherit weights from large pretrained models.
Revisiting Sentence Union Generation as a Testbed for Text Consolidation
In this paper, we suggest revisiting the sentence union generation task as an effective well-defined testbed for assessing text consolidation capabilities, decoupling the consolidation challenge from subjective content selection.
Automated Metrics for Medical Multi-Document Summarization Disagree with Human Evaluations
We analyze how automated summarization evaluation metrics correlate with lexical features of generated summaries, to other automated metrics including several we propose in this work, and to aspects of human-assessed summary quality.
A Hierarchical Encoding-Decoding Scheme for Abstractive Multi-document Summarization
Pre-trained language models (PLMs) have achieved outstanding achievements in abstractive single-document summarization (SDS).
Summarizing Multiple Documents with Conversational Structure for Meta-Review Generation
We present PeerSum, a novel dataset for generating meta-reviews of scientific papers.
Enhancing Large Language Model with Self-Controlled Memory Framework
Large Language Models (LLMs) are constrained by their inability to process lengthy inputs, resulting in the loss of critical historical information.
XWikiGen: Cross-lingual Summarization for Encyclopedic Text Generation in Low Resource Languages
But, for low-resource languages, the scarcity of reference articles makes monolingual summarization ineffective in solving this problem.
Compressed Heterogeneous Graph for Abstractive Multi-Document Summarization
We propose HGSUM, an MDS model that extends an encoder-decoder architecture, to incorporate a heterogeneous graph to represent different semantic units (e. g., words and sentences) of the documents.
PDSum: Prototype-driven Continuous Summarization of Evolving Multi-document Sets Stream
Summarizing text-rich documents has been long studied in the literature, but most of the existing efforts have been made to summarize a static and predefined multi-document set.
Generating a Structured Summary of Numerous Academic Papers: Dataset and Method
Existing MDS datasets usually focus on producing the structureless summary covering a few input documents.