Document Summarization
196 papers with code • 7 benchmarks • 28 datasets
Automatic Document Summarization is the task of rewriting a document into its shorter form while still retaining its important content. The most popular two paradigms are extractive approaches and abstractive approaches. Extractive approaches generate summaries by extracting parts of the original document (usually sentences), while abstractive methods may generate new words or phrases which are not in the original document.
Libraries
Use these libraries to find Document Summarization models and implementationsDatasets
Most implemented papers
Neural Summarization by Extracting Sentences and Words
Traditional approaches to extractive summarization rely heavily on human-engineered features.
Urdu Summary Corpus
This paper reports the construction of a benchmark corpus for Urdu summaries (abstracts) to facilitate the development and evaluation of single document summarization systems for Urdu language.
Distraction-Based Neural Networks for Document Summarization
Distributed representation learned with neural networks has recently shown to be effective in modeling natural languages at fine granularities such as words, phrases, and even sentences.
A General Optimization Framework for Multi-Document Summarization Using Genetic Algorithms and Swarm Intelligence
Extracting summaries via integer linear programming and submodularity are popular and successful techniques in extractive multi-document summarization.
Bridging the gap between extractive and abstractive summaries: Creation and evaluation of coherent extracts from heterogeneous sources
Coherent extracts are a novel type of summary combining the advantages of manually created abstractive summaries, which are fluent but difficult to evaluate, and low-quality automatically created extractive summaries, which lack coherence and structure.
The Next Step for Multi-Document Summarization: A Heterogeneous Multi-Genre Corpus Built with a Novel Construction Approach
In a detailed analysis, we show that our new corpus is significantly different from the homogeneous corpora commonly used, and that it is heterogeneous along several dimensions.
Automatic Argumentative-Zoning Using Word2vec
In comparison with the hand-crafted features, the word2vec method won for most of the categories.
Neural Extractive Summarization with Side Information
Most extractive summarization methods focus on the main body of the document from which sentences need to be extracted.