Document Summarization
195 papers with code • 7 benchmarks • 28 datasets
Automatic Document Summarization is the task of rewriting a document into its shorter form while still retaining its important content. The most popular two paradigms are extractive approaches and abstractive approaches. Extractive approaches generate summaries by extracting parts of the original document (usually sentences), while abstractive methods may generate new words or phrases which are not in the original document.
Libraries
Use these libraries to find Document Summarization models and implementationsDatasets
Latest papers
Investigating Text Shortening Strategy in BERT: Truncation vs Summarization
In this study, we investigate the performance of document truncation and summarization in text classification tasks.
Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis
Our end-to-end system achieves state-of-the-art performance on two large-scale document layout analysis datasets (PubLayNet and DocLayNet), a high-quality hierarchical document structure reconstruction dataset (HRDoc), and our Comp-HRDoc benchmark.
Shaping Political Discourse using multi-source News Summarization
Multi-document summarization is the process of automatically generating a concise summary of multiple documents related to the same topic.
OpenAsp: A Benchmark for Multi-document Open Aspect-based Summarization
To advance research on more realistic scenarios, we introduce OpenAsp, a benchmark for multi-document \textit{open} aspect-based summarization.
Supervising the Centroid Baseline for Extractive Multi-Document Summarization
The centroid method is a simple approach for extractive multi-document summarization and many improvements to its pipeline have been proposed.
Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles
In this paper, we propose a new task of summarizing diverse information encountered in multiple news articles encompassing the same event.
ODSum: New Benchmarks for Open Domain Multi-Document Summarization
Open-domain Multi-Document Summarization (ODMDS) is a critical tool for condensing vast arrays of documents into coherent, concise summaries.
Gender Bias in News Summarization: Measures, Pitfalls and Corpora
Summarization is an important application of large language models (LLMs).
Extending Context Window of Large Language Models via Positional Interpolation
We present Position Interpolation (PI) that extends the context window sizes of RoPE-based pretrained LLMs such as LLaMA models to up to 32768 with minimal fine-tuning (within 1000 steps), while demonstrating strong empirical results on various tasks that require long context, including passkey retrieval, language modeling, and long document summarization from LLaMA 7B to 65B.
Pre-training Meets Clustering: A Hybrid Extractive Multi-document Summarization Model
Outcomes validate that our proposed model shows greatly enhanced performance as compared to the existent unsupervised state-of-the-art approaches.