Text Summarization

369 papers with code • 33 benchmarks • 88 datasets

Text Summarization is a natural language processing (NLP) task that involves condensing a lengthy text document into a shorter, more compact version while still retaining the most important information and meaning. The goal is to produce a summary that accurately represents the content of the original text in a concise form.

There are different approaches to text summarization, including extractive methods that identify and extract important sentences or phrases from the text, and abstractive methods that generate new text based on the content of the original text.

Benchmarks

Add a Result

These leaderboards are used to track progress in Text Summarization

Dataset	Best Model	Compare
GigaWord	Pegasus+DotProd	See all
Arxiv HEP-TH citation graph	Top Down Transformer (AdaPool) (464M)	See all
Pubmed	Top Down Transformer (AdaPool) (464M)	See all
MTEB	MPNet-multilingual	See all
X-Sum	Pegasus 2B + SLiC	See all
DUC 2004 Task 1	Transformer+WDrop	See all
CNN / Daily Mail (Anonymized)	HSSAS	See all
SAMSum	InstructDS	See all
Reddit TIFU	PEGASUS 2B + SLiC	See all
Klexikon	Luhn's algorithm (25 sentences)	See all
GigaWord-10k	ERNIE-GENLARGE (large-scale text corpora)	See all
WikiHow	BertSum	See all
arXiv Summarization Dataset	FactorSum	See all
BookSum	Echoes-Extractive-Abstractive	See all
BigPatent	LongT5	See all
How2	Ground-truth transcript + Action with Hierarchical Attn	See all
OrangeSum	mBARThez (OrangeSum abstract)	See all
GovReport	FactorSum	See all
DialogSum	InstructDS	See all
BBC XSum	MatchSum	See all
CL-SciSumm	GCN Hybrid	See all
Webis-Snippet-20 Corpus	Anchor-context + Query biased	See all
AMI	HAT-CNNDM	See all
Gazeta	Finetuned mBART	See all
S2ORC	GenCompareSum	See all
CORD-19	GenCompareSum	See all
BillSum	Longformer Encoder Decoder	See all
MentSum	BART	See all
QMSum	BART-LS	See all
LCSTS	LSTM-seq2seq	See all
MeQSum	BiomedGPT	See all
MediaSum	SRformer-BART	See all
XSum	SRformer-BART	See all

Show all 33 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Text Summarization models and implementations

huggingface/transformers

8 papers

125,478

theamrzaki/text_summurization_abstr…

5 papers

518

HHousen/TransformerSum

3 papers

425

dennlinger/summaries

3 papers

See all 10 libraries.

Datasets

Subtasks

Opinion Summarization

Extractive Text Summarization

Sentence Compression

Sentence Summarization

Scientific Document Summarization

Timeline Summarization

Unsupervised Opinion Summarization

Query-Based Extractive Summarization

Email Thread Summarization

Most implemented papers

Most implemented Social Latest No code

Evaluating the Factual Consistency of Abstractive Text Summarization

yuhui-zh15/FactCCX • • EMNLP 2020

Currently used metrics for assessing summarization algorithms do not account for whether summaries are factually consistent with source documents.

Paper
Code

ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training

microsoft/ProphetNet • • 13 Jan 2020

This paper presents a new sequence-to-sequence pre-training model called ProphetNet, which introduces a novel self-supervised objective named future n-gram prediction and the proposed n-stream self-attention mechanism.

Paper
Code

ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation

PaddlePaddle/ERNIE • • 26 Jan 2020

Current pre-training works in natural language generation pay little attention to the problem of exposure bias on downstream tasks.

Paper
Code

BARThez: a Skilled Pretrained French Sequence-to-Sequence Model

moussaKam/BARThez • • EMNLP 2021

We show BARThez to be very competitive with state-of-the-art BERT-based French language models such as CamemBERT and FlauBERT.

Paper
Code

PanGu-$α$: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation

mindspore-ai/models • • 26 Apr 2021

To enhance the generalization ability of PanGu-$\alpha$, we collect 1. 1TB high-quality Chinese data from a wide range of domains to pretrain the model.

Paper
Code

OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

ofa-sys/ofa • • 7 Feb 2022

In this work, we pursue a unified paradigm for multimodal pretraining to break the scaffolds of complex task/modality-specific customization.

Paper
Code

LCSTS: A Large Scale Chinese Short Text Summarization Dataset

CLUEbenchmark/CLGE • • EMNLP 2015

Automatic text summarization is widely regarded as the highly difficult problem, partially because of the lack of large text summarization data set.

Paper
Code

A Regularized Framework for Sparse and Structured Neural Attention

vene/sparse-structured-attention • • NeurIPS 2017

Modern neural networks are often augmented with an attention mechanism, which tells the network where to focus within the input.

Paper
Code

Data-driven Summarization of Scientific Articles

ninikolov/data-driven-summarization • 24 Apr 2018

Data-driven approaches to sequence-to-sequence modelling have been successfully applied to short text summarization of news articles.

Paper
Code

Deep Reinforcement Learning For Sequence to Sequence Models

yaserkl/RLSeq2Seq • • 24 May 2018

In this survey, we consider seq2seq problems from the RL point of view and provide a formulation combining the power of RL methods in decision-making with sequence-to-sequence models that enable remembering long-term memories.

Paper
Code

Text Summarization

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result