Addressing Domain Adaptation for Chinese Word Segmentation with Global Recurrent Structure
Boundary features are widely used in traditional Chinese Word Segmentation (CWS) methods as they can utilize unlabeled data to help improve the Out-of-Vocabulary (OOV) word recognition performance. Although various neural network methods for CWS have achieved performance competitive with state-of-the-art systems, these methods, constrained by the domain and size of the training corpus, do not work well in domain adaptation. In this paper, we propose a novel BLSTM-based neural network model which incorporates a global recurrent structure designed for modeling boundary features dynamically. Experiments show that the proposed structure can effectively boost the performance of Chinese Word Segmentation, especially OOV-Recall, which brings benefits to domain adaptation. We achieved state-of-the-art results on 6 domains of CNKI articles, and competitive results to the best reported on the 4 domains of SIGHAN Bakeoff 2010 data.
PDF Abstract IJCNLP 2017 PDF IJCNLP 2017 Abstract