Text Summarization

369 papers with code • 33 benchmarks • 88 datasets

Text Summarization is a natural language processing (NLP) task that involves condensing a lengthy text document into a shorter, more compact version while still retaining the most important information and meaning. The goal is to produce a summary that accurately represents the content of the original text in a concise form.

There are different approaches to text summarization, including extractive methods that identify and extract important sentences or phrases from the text, and abstractive methods that generate new text based on the content of the original text.

Libraries

Use these libraries to find Text Summarization models and implementations

Most implemented papers

Evaluating the Factual Consistency of Abstractive Text Summarization

yuhui-zh15/FactCCX EMNLP 2020

Currently used metrics for assessing summarization algorithms do not account for whether summaries are factually consistent with source documents.

ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training

microsoft/ProphetNet 13 Jan 2020

This paper presents a new sequence-to-sequence pre-training model called ProphetNet, which introduces a novel self-supervised objective named future n-gram prediction and the proposed n-stream self-attention mechanism.

ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation

PaddlePaddle/ERNIE 26 Jan 2020

Current pre-training works in natural language generation pay little attention to the problem of exposure bias on downstream tasks.

BARThez: a Skilled Pretrained French Sequence-to-Sequence Model

moussaKam/BARThez EMNLP 2021

We show BARThez to be very competitive with state-of-the-art BERT-based French language models such as CamemBERT and FlauBERT.

PanGu-$α$: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation

mindspore-ai/models 26 Apr 2021

To enhance the generalization ability of PanGu-$\alpha$, we collect 1. 1TB high-quality Chinese data from a wide range of domains to pretrain the model.

OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

ofa-sys/ofa 7 Feb 2022

In this work, we pursue a unified paradigm for multimodal pretraining to break the scaffolds of complex task/modality-specific customization.

LCSTS: A Large Scale Chinese Short Text Summarization Dataset

CLUEbenchmark/CLGE EMNLP 2015

Automatic text summarization is widely regarded as the highly difficult problem, partially because of the lack of large text summarization data set.

A Regularized Framework for Sparse and Structured Neural Attention

vene/sparse-structured-attention NeurIPS 2017

Modern neural networks are often augmented with an attention mechanism, which tells the network where to focus within the input.

Data-driven Summarization of Scientific Articles

ninikolov/data-driven-summarization 24 Apr 2018

Data-driven approaches to sequence-to-sequence modelling have been successfully applied to short text summarization of news articles.

Deep Reinforcement Learning For Sequence to Sequence Models

yaserkl/RLSeq2Seq 24 May 2018

In this survey, we consider seq2seq problems from the RL point of view and provide a formulation combining the power of RL methods in decision-making with sequence-to-sequence models that enable remembering long-term memories.