Document Summarization

194 papers with code • 7 benchmarks • 28 datasets

Automatic Document Summarization is the task of rewriting a document into its shorter form while still retaining its important content. The most popular two paradigms are extractive approaches and abstractive approaches. Extractive approaches generate summaries by extracting parts of the original document (usually sentences), while abstractive methods may generate new words or phrases which are not in the original document.

Source: HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization

Libraries

Use these libraries to find Document Summarization models and implementations

Most implemented papers

Scoring Sentence Singletons and Pairs for Abstractive Summarization

ucfnlp/summarization-sing-pair-mix ACL 2019

There is thus a crucial gap between sentence selection and fusion to support summarizing by both compressing single sentences and fusing pairs.

AREDSUM: Adaptive Redundancy-Aware Iterative Sentence Ranking for Extractive Document Summarization

kepingbi/ARedSumSentRank EACL 2021

Redundancy-aware extractive summarization systems score the redundancy of the sentences to be included in a summary either jointly with their salience information or separately as an additional sentence scoring step.

DebateSum: A large-scale argument mining and summarization dataset

Hellisotherpeople/DebateSum COLING (ArgMining) 2020

Finally, we present a search engine for this dataset which is utilized extensively by members of the National Speech and Debate Association today.

Centroid-based Text Summarization through Compositionality of Word Embeddings

gaetangate/text-summarizer WS 2017

The textual similarity is a crucial aspect for many extractive text summarization methods.

TAP-DLND 1.0 : A Corpus for Document Level Novelty Detection

edithal-14/A-Deep-Neural-Solution-To-Document-Level-Novelty-Detection-COLING-2018- LREC 2018

Detecting novelty of an entire document is an Artificial Intelligence (AI) frontier problem that has widespread NLP applications, such as extractive document summarization, tracking development of news events, predicting impact of scholarly articles, etc.

Extractive Summarization as Text Matching

maszhongming/MatchSum ACL 2020

This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.

Screenplay Summarization Using Latent Narrative Structure

EdinburghNLP/csi-corpus ACL 2020

Most general-purpose extractive summarization models are trained on news articles, which are short and present all important information upfront.

On Faithfulness and Factuality in Abstractive Summarization

google-research-datasets/xsum_hallucination_annotations ACL 2020

It is well known that the standard likelihood training and approximate decoding objectives in neural text generation models lead to less human-like responses for open-ended tasks such as language modeling and story generation.

Leveraging Graph to Improve Abstractive Multi-Document Summarization

PaddlePaddle/Research ACL 2020

Graphs that capture relations between textual units have great benefits for detecting salient information from multiple documents and generating overall coherent summaries.

Pre-training via Paraphrasing

lucidrains/marge-pytorch NeurIPS 2020

The objective noisily captures aspects of paraphrase, translation, multi-document summarization, and information retrieval, allowing for strong zero-shot performance on several tasks.