|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
Chinese word segmentation (CWS) is a fundamental step of Chinese natural language processing.
Moreover, it is shown that reasonable performance can be obtained when ZEN is trained on a small corpus, which is important for applying pre-training techniques to scenarios with limited data.
SOTA for Chinese Word Segmentation on MSR
We present a simple yet elegant solution to train a single joint model on multi-criteria corpora for Chinese Word Segmentation (CWS).
However, due to the lack of rich pictographic evidence in glyphs and the weak generalization ability of standard computer vision models on character data, an effective way to utilize the glyph information remains to be found.
CHINESE DEPENDENCY PARSING CHINESE NAMED ENTITY RECOGNITION CHINESE PART-OF-SPEECH TAGGING CHINESE SEMANTIC ROLE LABELING CHINESE SENTENCE PAIR CLASSIFICATION CHINESE WORD SEGMENTATION DEPENDENCY PARSING DOCUMENT CLASSIFICATION IMAGE CLASSIFICATION LANGUAGE MODELLING MACHINE TRANSLATION MULTI-TASK LEARNING PART-OF-SPEECH TAGGING SEMANTIC ROLE LABELING SEMANTIC TEXTUAL SIMILARITY SENTENCE CLASSIFICATION SENTIMENT ANALYSIS
However, existing methods for Chinese NER either do not exploit word boundary information from CWS or cannot filter the specific information of CWS.
Neural models with minimal feature engineering have achieved competitive performance against traditional methods for the task of Chinese word segmentation.
Most previous approaches to Chinese word segmentation formalize this problem as a character-based sequence labeling task where only contextual information within fixed sized local windows and simple interactions between adjacent tags can be captured.
Previous lattice LSTM model takes word embeddings as the lexicon input, we prove that subword encoding can give the comparable performance and has the benefit of not relying on any external segmentor.
Experiments on WMT14 translation tasks demonstrate that ATR-based neural machine translation can yield competitive performance on English- German and English-French language pairs in terms of both translation quality and speed.
As far as we know, we are the first to propose a neural model for unsupervised CWS and achieve competitive performance to the state-of-the-art statistical models on four different datasets from SIGHAN 2005 bakeoff.