Chinese Word Segmentation

48 papers with code • 6 benchmarks • 3 datasets

Chinese word segmentation is the task of splitting Chinese text (i.e. a sequence of Chinese characters) into words (Source: www.nlpprogress.com).

Latest papers with no code

That Slepen Al the Nyght with Open Ye! Cross-era Sequence Segmentation with Switch-memory

no code yet • ACL ARR November 2021

However, a considerable amount of texts are written in languages of different eras, which brings obstacles to natural language processing tasks, such as word segmentation and machine translation.

Improving Prosody for Unseen Texts in Speech Synthesis by Utilizing Linguistic Information and Noisy Data

no code yet • 15 Nov 2021

Recent advancements in end-to-end speech synthesis have made it possible to generate highly natural speech.

Span Labeling Approach for Vietnamese and Chinese Word Segmentation

no code yet • 1 Oct 2021

In this paper, we propose a span labeling approach to model n-gram information for Vietnamese word segmentation, namely SPAN SEG.

MVP-BERT: Multi-Vocab Pre-training for Chinese BERT

no code yet • ACL 2021

Despite the development of pre-trained language models (PLMs) significantly raise the performances of various Chinese natural language processing (NLP) tasks, the vocabulary (vocab) for these Chinese PLMs remains to be the one provided by Google Chinese BERT (CITATION), which is based on Chinese characters (chars).

Bidirectional LSTM-CRF Attention-based Model for Chinese Word Segmentation

no code yet • 20 May 2021

Chinese word segmentation (CWS) is the basic of Chinese natural language processing (NLP).

A More Efficient Chinese Named Entity Recognition base on BERT and Syntactic Analysis

no code yet • 11 Jan 2021

We propose a new Named entity recognition (NER) method to effectively make use of the results of Part-of-speech (POS) tagging, Chinese word segmentation (CWS) and parsing while avoiding NER error caused by POS tagging error.

Segmenting Natural Language Sentences via Lexical Unit Analysis

no code yet • Findings (EMNLP) 2021

In this work, we present Lexical Unit Analysis (LUA), a framework for general sequence segmentation tasks.

Multi-grained Chinese Word Segmentation with Weakly Labeled Data

no code yet • COLING 2020

Detailed evaluation shows that our proposed model with weakly labeled data significantly outperforms the state-of-the-art MWS model by 1. 12 and 5. 97 on NEWS and BAIKE data in F1.

Towards Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning

no code yet • COLING 2020

The ambiguous annotation criteria lead to divergence of Chinese Word Segmentation (CWS) datasets in various granularities.

MVP-BERT: Redesigning Vocabularies for Chinese BERT and Multi-Vocab Pretraining

no code yet • 17 Nov 2020

Despite the development of pre-trained language models (PLMs) significantly raise the performances of various Chinese natural language processing (NLP) tasks, the vocabulary for these Chinese PLMs remain to be the one provided by Google Chinese Bert \cite{devlin2018bert}, which is based on Chinese characters.