Document Summarization

196 papers with code • 7 benchmarks • 28 datasets

Automatic Document Summarization is the task of rewriting a document into its shorter form while still retaining its important content. The most popular two paradigms are extractive approaches and abstractive approaches. Extractive approaches generate summaries by extracting parts of the original document (usually sentences), while abstractive methods may generate new words or phrases which are not in the original document.

Source: HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization

Libraries

Use these libraries to find Document Summarization models and implementations

Most implemented papers

Neural Summarization by Extracting Sentences and Words

adrian9631/TextSumma ACL 2016

Traditional approaches to extractive summarization rely heavily on human-engineered features.

Urdu Summary Corpus

humsha/USCorpus LREC 2016

This paper reports the construction of a benchmark corpus for Urdu summaries (abstracts) to facilitate the development and evaluation of single document summarization systems for Urdu language.

Distraction-Based Neural Networks for Document Summarization

lukecq1231/nats 26 Oct 2016

Distributed representation learned with neural networks has recently shown to be effective in modeling natural languages at fine granularities such as words, phrases, and even sentences.

A General Optimization Framework for Multi-Document Summarization Using Genetic Algorithms and Swarm Intelligence

UKPLab/coling2016-genetic-swarm-MDS COLING 2016

Extracting summaries via integer linear programming and submodularity are popular and successful techniques in extractive multi-document summarization.

Bridging the gap between extractive and abstractive summaries: Creation and evaluation of coherent extracts from heterogeneous sources

AIPHES/DBS COLING 2016

Coherent extracts are a novel type of summary combining the advantages of manually created abstractive summaries, which are fluent but difficult to evaluate, and low-quality automatically created extractive summaries, which lack coherence and structure.

The Next Step for Multi-Document Summarization: A Heterogeneous Multi-Genre Corpus Built with a Novel Construction Approach

AIPHES/hMDS COLING 2016

In a detailed analysis, we show that our new corpus is significantly different from the homogeneous corpora commonly used, and that it is heterogeneous along several dimensions.

Automatic Argumentative-Zoning Using Word2vec

abstatic/ire_project18 29 Mar 2017

In comparison with the hand-crafted features, the word2vec method won for most of the categories.

Neural Extractive Summarization with Side Information

shashiongithub/sidenet 14 Apr 2017

Most extractive summarization methods focus on the main body of the document from which sentences need to be extracted.