Text Segmentation

34 papers with code • 3 benchmarks • 7 datasets

Text segmentation deals with the correct division of a document into semantically coherent blocks.

Benchmarks

Add a Result

These leaderboards are used to track progress in Text Segmentation

Dataset	Best Model	Compare
YTSeg	MiniSeg (pretrained on Wiki-727K)	See all
SPMRL Hebrew segmentation data	RFTokenizer	See all
Wiki5K Hebrew segmentation	RFTokenizer	See all

Datasets

Most implemented papers

Most implemented Social Latest No code

Handwritten Text Segmentation via End-to-End Learning of Convolutional Neural Network

gregbugaj/unet-denoiser • • 12 Jun 2019

For training our network, we develop a cross-entropy based loss function that addresses the imbalance problems.

Paper
Code

Crowdsourcing and Aggregating Nested Markable Annotations

juntaoy/dali-preprocessing-pipeline • ACL 2019

One of the key steps in language resource creation is the identification of the text segments to be annotated, or markables, which depending on the task may vary from nominal chunks for named entity resolution to (potentially nested) noun phrases in coreference resolution (or mentions) to larger text segments in text segmentation.

Paper
Code

Two-Level Transformer and Auxiliary Coherence Modeling for Improved Text Segmentation

EducationalTestingService/CATS • • 3 Jan 2020

Breaking down the structure of long texts into semantically coherent segments makes the texts more readable and supports downstream applications like summarization and retrieval.

Paper
Code

Text Segmentation by Cross Segment Attention

aakash222/text-segmentation-NLP • • EMNLP 2020

Document and discourse segmentation are two fundamental NLP tasks pertaining to breaking up text into constituents, which are commonly used to help downstream tasks such as information retrieval or text summarization.

Paper
Code

Improving Segmentation for Technical Support Problems

kushalchauhan98/ticket-segmentation • ACL 2020

We formulate the problem as a sequence labelling task, and study the performance of state of the art approaches.

Paper
Code

Chapter Captor: Text Segmentation in Novels

cpethe/chapter-captor • • EMNLP 2020

Books are typically segmented into chapters and sections, representing coherent subnarratives and topics.

Paper
Code

Interpretable Natural Language Segmentation Based on Link Grammar

aigents/aigents-java-nlp • • 14 Nov 2020

Natural language segmentation (NLS), or text segmentation, refers to the process of dividing written text into meaningful units.

Paper
Code

Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

SHI-Labs/Rethinking-Text-Segmentation • • CVPR 2021

We also introduce Text Refinement Network (TexRNet), a novel text segmentation approach that adapts to the unique properties of text, e. g. non-convex boundary, diverse texture, etc., which often impose burdens on traditional segmentation models.

Paper
Code

Hierarchical Text Segmentation for Medieval Manuscripts

hazemamir/greedy_text_segmentation • COLING 2020

In this paper, we address the segmentation of books of hours, Latin devotional manuscripts of the late Middle Ages, that exhibit challenging issues: a complex hierarchical entangled structure, variable content, noisy transcriptions with no sentence markers, and strong correlations between sections for which topical information is no longer sufficient to draw segmentation boundaries.

Paper
Code

Structural Text Segmentation of Legal Documents

dennlinger/TopicalChange • • 7 Dec 2020

The growing complexity of legal cases has lead to an increasing interest in legal information retrieval systems that can effectively satisfy user-specific information needs.

Paper
Code

Text Segmentation

Benchmarks Add a Result

Datasets

Most implemented papers

Content

Benchmarks

Add a Result