Text Segmentation
34 papers with code • 3 benchmarks • 7 datasets
Text segmentation deals with the correct division of a document into semantically coherent blocks.
Datasets
Most implemented papers
Handwritten Text Segmentation via End-to-End Learning of Convolutional Neural Network
For training our network, we develop a cross-entropy based loss function that addresses the imbalance problems.
Crowdsourcing and Aggregating Nested Markable Annotations
One of the key steps in language resource creation is the identification of the text segments to be annotated, or markables, which depending on the task may vary from nominal chunks for named entity resolution to (potentially nested) noun phrases in coreference resolution (or mentions) to larger text segments in text segmentation.
Two-Level Transformer and Auxiliary Coherence Modeling for Improved Text Segmentation
Breaking down the structure of long texts into semantically coherent segments makes the texts more readable and supports downstream applications like summarization and retrieval.
Text Segmentation by Cross Segment Attention
Document and discourse segmentation are two fundamental NLP tasks pertaining to breaking up text into constituents, which are commonly used to help downstream tasks such as information retrieval or text summarization.
Improving Segmentation for Technical Support Problems
We formulate the problem as a sequence labelling task, and study the performance of state of the art approaches.
Chapter Captor: Text Segmentation in Novels
Books are typically segmented into chapters and sections, representing coherent subnarratives and topics.
Interpretable Natural Language Segmentation Based on Link Grammar
Natural language segmentation (NLS), or text segmentation, refers to the process of dividing written text into meaningful units.
Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach
We also introduce Text Refinement Network (TexRNet), a novel text segmentation approach that adapts to the unique properties of text, e. g. non-convex boundary, diverse texture, etc., which often impose burdens on traditional segmentation models.
Hierarchical Text Segmentation for Medieval Manuscripts
In this paper, we address the segmentation of books of hours, Latin devotional manuscripts of the late Middle Ages, that exhibit challenging issues: a complex hierarchical entangled structure, variable content, noisy transcriptions with no sentence markers, and strong correlations between sections for which topical information is no longer sufficient to draw segmentation boundaries.
Structural Text Segmentation of Legal Documents
The growing complexity of legal cases has lead to an increasing interest in legal information retrieval systems that can effectively satisfy user-specific information needs.