Japanese Sentence Compression with a Large Training Dataset

ACL 2017 · Shun Hasegawa, Yuta Kikuchi, Hiroya Takamura, Manabu Okumura ·

In English, high-quality sentence compression models by deleting words have been trained on automatically created large training datasets. We work on Japanese sentence compression by a similar approach. To create a large Japanese training dataset, a method of creating English training dataset is modified based on the characteristics of the Japanese language. The created dataset is used to train Japanese sentence compression models based on the recurrent neural network.

PDF Abstract