Morphological Disambiguation of South S\'ami with FSTs and Neural Networks
We present a method for conducting morphological disambiguation for South S{\'a}mi, which is an endangered language. Our method uses an FST-based morphological analyzer to produce an ambiguous set of morphological readings for each word in a sentence. These readings are disambiguated with a Bi-RNN model trained on the related North S{\'a}mi UD Treebank and some synthetically generated South S{\'a}mi data. The disambiguation is done on the level of morphological tags ignoring word forms and lemmas; this makes it possible to use North S{\'a}mi training data for South S{\'a}mi without the need for a bilingual dictionary or aligned word embeddings. Our approach requires only minimal resources for South S{\'a}mi, which makes it usable and applicable in the contexts of any other endangered language as well.
PDF Abstract