Chinese Word Segmentation

48 papers with code • 6 benchmarks • 3 datasets

Chinese word segmentation is the task of splitting Chinese text (i.e. a sequence of Chinese characters) into words (Source: www.nlpprogress.com).

Benchmarks

Add a Result

These leaderboards are used to track progress in Chinese Word Segmentation

Dataset	Best Model	Compare
MSR	BABERT-LE	See all
PKU	BABERT-LE	See all
CTB6	LATTE (Linguistic units, lattices, PTMs, GNNs)	See all
MSRA	BABERT-LE	See all
CITYU	WMSeg + ZEN	See all
AS	Glyce + BERT	See all

Datasets

Latest papers with no code

Most implemented Social Latest No code

That Slepen Al the Nyght with Open Ye! Cross-era Sequence Segmentation with Switch-memory

no code yet • ACL ARR November 2021

However, a considerable amount of texts are written in languages of different eras, which brings obstacles to natural language processing tasks, such as word segmentation and machine translation.

Paper
Add Code

Improving Prosody for Unseen Texts in Speech Synthesis by Utilizing Linguistic Information and Noisy Data

no code yet • 15 Nov 2021

Recent advancements in end-to-end speech synthesis have made it possible to generate highly natural speech.

Paper
Add Code

Span Labeling Approach for Vietnamese and Chinese Word Segmentation

no code yet • 1 Oct 2021

In this paper, we propose a span labeling approach to model n-gram information for Vietnamese word segmentation, namely SPAN SEG.

Paper
Add Code

MVP-BERT: Multi-Vocab Pre-training for Chinese BERT

no code yet • ACL 2021

Despite the development of pre-trained language models (PLMs) significantly raise the performances of various Chinese natural language processing (NLP) tasks, the vocabulary (vocab) for these Chinese PLMs remains to be the one provided by Google Chinese BERT (CITATION), which is based on Chinese characters (chars).

Paper
Add Code

Bidirectional LSTM-CRF Attention-based Model for Chinese Word Segmentation

no code yet • 20 May 2021

Chinese word segmentation (CWS) is the basic of Chinese natural language processing (NLP).

Paper
Add Code

A More Efficient Chinese Named Entity Recognition base on BERT and Syntactic Analysis

no code yet • 11 Jan 2021

We propose a new Named entity recognition (NER) method to effectively make use of the results of Part-of-speech (POS) tagging, Chinese word segmentation (CWS) and parsing while avoiding NER error caused by POS tagging error.

Paper
Add Code

Segmenting Natural Language Sentences via Lexical Unit Analysis

no code yet • Findings (EMNLP) 2021

In this work, we present Lexical Unit Analysis (LUA), a framework for general sequence segmentation tasks.

Paper
Add Code

Multi-grained Chinese Word Segmentation with Weakly Labeled Data

no code yet • COLING 2020

Detailed evaluation shows that our proposed model with weakly labeled data significantly outperforms the state-of-the-art MWS model by 1. 12 and 5. 97 on NEWS and BAIKE data in F1.

Paper
Add Code

Towards Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning

no code yet • COLING 2020

The ambiguous annotation criteria lead to divergence of Chinese Word Segmentation (CWS) datasets in various granularities.

Paper
Add Code

MVP-BERT: Redesigning Vocabularies for Chinese BERT and Multi-Vocab Pretraining

no code yet • 17 Nov 2020

Despite the development of pre-trained language models (PLMs) significantly raise the performances of various Chinese natural language processing (NLP) tasks, the vocabulary for these Chinese PLMs remain to be the one provided by Google Chinese Bert \cite{devlin2018bert}, which is based on Chinese characters.

Paper
Add Code

Chinese Word Segmentation

Benchmarks Add a Result

Datasets

Latest papers with no code

Content

Benchmarks

Add a Result