Chinese Word Segmentation
48 papers with code • 6 benchmarks • 3 datasets
Chinese word segmentation is the task of splitting Chinese text (i.e. a sequence of Chinese characters) into words (Source: www.nlpprogress.com).
Benchmarks
These leaderboards are used to track progress in Chinese Word Segmentation
Latest papers with no code
That Slepen Al the Nyght with Open Ye! Cross-era Sequence Segmentation with Switch-memory
However, a considerable amount of texts are written in languages of different eras, which brings obstacles to natural language processing tasks, such as word segmentation and machine translation.
Improving Prosody for Unseen Texts in Speech Synthesis by Utilizing Linguistic Information and Noisy Data
Recent advancements in end-to-end speech synthesis have made it possible to generate highly natural speech.
Span Labeling Approach for Vietnamese and Chinese Word Segmentation
In this paper, we propose a span labeling approach to model n-gram information for Vietnamese word segmentation, namely SPAN SEG.
MVP-BERT: Multi-Vocab Pre-training for Chinese BERT
Despite the development of pre-trained language models (PLMs) significantly raise the performances of various Chinese natural language processing (NLP) tasks, the vocabulary (vocab) for these Chinese PLMs remains to be the one provided by Google Chinese BERT (CITATION), which is based on Chinese characters (chars).
Bidirectional LSTM-CRF Attention-based Model for Chinese Word Segmentation
Chinese word segmentation (CWS) is the basic of Chinese natural language processing (NLP).
A More Efficient Chinese Named Entity Recognition base on BERT and Syntactic Analysis
We propose a new Named entity recognition (NER) method to effectively make use of the results of Part-of-speech (POS) tagging, Chinese word segmentation (CWS) and parsing while avoiding NER error caused by POS tagging error.
Segmenting Natural Language Sentences via Lexical Unit Analysis
In this work, we present Lexical Unit Analysis (LUA), a framework for general sequence segmentation tasks.
Multi-grained Chinese Word Segmentation with Weakly Labeled Data
Detailed evaluation shows that our proposed model with weakly labeled data significantly outperforms the state-of-the-art MWS model by 1. 12 and 5. 97 on NEWS and BAIKE data in F1.
Towards Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning
The ambiguous annotation criteria lead to divergence of Chinese Word Segmentation (CWS) datasets in various granularities.
MVP-BERT: Redesigning Vocabularies for Chinese BERT and Multi-Vocab Pretraining
Despite the development of pre-trained language models (PLMs) significantly raise the performances of various Chinese natural language processing (NLP) tasks, the vocabulary for these Chinese PLMs remain to be the one provided by Google Chinese Bert \cite{devlin2018bert}, which is based on Chinese characters.