Sentence segmentation

19 papers with code • 1 benchmarks • 3 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Ascle: A Python Natural Language Processing Toolkit for Medical Text Generation

yale-lily/medgen 28 Nov 2023

This study introduces Ascle, a pioneering natural language processing (NLP) toolkit designed for medical text generation.

Lexical Semantic Recognition

nert-nlp/streusle ACL (MWE) 2021

In lexical semantics, full-sentence segmentation and segment labeling of various phenomena are generally treated separately, despite their interdependence.

Abstractive Summarization of Spoken andWritten Instructions with BERT

nlpyang/PreSumm KDD Converse 2020

Summarization of speech is a difficult problem due to the spontaneity of the flow, disfluencies, and other issues that are not usually encountered in written texts.

Universal Dependency Parsing from Scratch

stanfordnlp/stanfordnlp CONLL 2018

This paper describes Stanford's system at the CoNLL 2018 UD Shared Task.

Fine-Grained Argument Unit Recognition and Classification

trtm/AURC 22 Apr 2019

In this work, we argue that the task should be performed on a more fine-grained level of sequence labeling.

Using Punkt for Sentence Segmentation in non-Latin Scripts: Experiments on Kurdish (Sorani) Texts

KurdishBLARK/KTC-Segmented 9 Apr 2020

The Kurdish language is a multi-dialect, under-resourced language which is written in different scripts.

Abstractive Summarization of Spoken and Written Instructions with BERT

alebryvas/berk266 21 Aug 2020

Summarization of speech is a difficult problem due to the spontaneity of the flow, disfluencies, and other issues that are not usually encountered in written texts.

Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation

csebuetnlp/banglanmt EMNLP 2020

With the segmenter and the two methods combined, we compile a high-quality Bengali-English parallel corpus comprising of 2. 75 million sentence pairs, more than 2 million of which were not available before.

Evaluating Sentence Segmentation and Word Tokenization Systems on Estonian Web Texts

ksirts/EWTB_sentence_seg 16 Nov 2020

Texts obtained from web are noisy and do not necessarily follow the orthographic sentence and word boundary rules.