1 code implementation • LREC 2020 • Suteera Seeha, Ivan Bilan, Liliana Mamani Sanchez, Johannes Huber, Michael Matuschek, Hinrich Sch{\"u}tze
We propose ThaiLMCut, a semi-supervised approach for Thai word segmentation which utilizes a bi-directional character language model (LM) as a way to leverage useful linguistic knowledge from unlabeled data.
Ranked #3 on Thai Word Segmentation on BEST-2010
1 code implementation • 9 Jul 2018 • Ivan Bilan, Benjamin Roth
The self-attention encoder also uses a custom implementation of relative positional encodings which allow each word in the sentence to take into account its left and right context.