A Text Editing Approach to Joint Japanese Word Segmentation, POS Tagging, and Lexical Normalization

WNUT (ACL) 2021 · Shohei Higashiyama, Masao Utiyama, Taro Watanabe, Eiichiro Sumita ·

Lexical normalization, in addition to word segmentation and part-of-speech tagging, is a fundamental task for Japanese user-generated text processing. In this paper, we propose a text editing model to solve the three task jointly and methods of pseudo-labeled data generation to overcome the problem of data deficiency. Our experiments showed that the proposed model achieved better normalization performance when trained on more diverse pseudo-labeled data.

PDF Abstract