no code implementations • EAMT 2022 • Toms Bergmanis, Marcis Pinnis, Roberts Rozis, Jānis Šlapiņš, Valters Šics, Berta Bernāne, Guntars Pužulis, Endijs Titomers, Andre Tättar, Taido Purason, Hele-Andra Kuulmets, Agnes Luhtaru, Liisa Rätsep, Maali Tars, Annika Laumets-Tättar, Mark Fishel
We present the MTee project - a research initiative funded via an Estonian public procurement to develop machine translation technology that is open-source and free of charge.
no code implementations • 28 Sep 2022 • Toms Bergmanis, Mārcis Pinnis
In this paper, we examine the development and usage of six low-resource machine translation systems translating between the Ukrainian language and each of the official languages of the Baltic states.
no code implementations • LREC 2022 • Andis Lagzdiņš, Uldis Siliņš, Mārcis Pinnis, Toms Bergmanis, Artūrs Vasiļevskis, Andrejs Vasiļjevs
Consolidated access to current and reliable terms from different subject fields and languages is necessary for content creators and translators.
no code implementations • WMT (EMNLP) 2021 • Toms Bergmanis, Mārcis Pinnis
The majority of language domains require prudent use of terminology to ensure clarity and adequacy of information conveyed.
1 code implementation • EACL 2021 • Toms Bergmanis, Mārcis Pinnis
Most of the recent work on terminology integration in machine translation has assumed that terminology translations are given already inflected in forms that are suitable for the target language sentence.
1 code implementation • WMT (EMNLP) 2020 • Artūrs Stafanovičs, Toms Bergmanis, Mārcis Pinnis
to a language with grammatical gender, it might be necessary to determine the gender of the subject "secretary".
no code implementations • 11 Sep 2020 • Toms Bergmanis, Artūrs Stafanovičs, Mārcis Pinnis
Neural machine translation systems typically are trained on curated corpora and break when faced with non-standard orthography or punctuation.
1 code implementation • NAACL 2019 • Toms Bergmanis, Sharon Goldwater
Lemmatization aims to reduce the sparse data problem by relating the inflected forms of a word to its dictionary form.
1 code implementation • 2 Apr 2019 • Toms Bergmanis, Sharon Goldwater
Lemmatization aims to reduce the sparse data problem by relating the inflected forms of a word to its dictionary form.
no code implementations • NAACL 2018 • Toms Bergmanis, Sharon Goldwater
The main motivation for developing contextsensitive lemmatizers is to improve performance on unseen and ambiguous words.
no code implementations • EACL 2017 • Toms Bergmanis, Sharon Goldwater
A major motivation for unsupervised morphological analysis is to reduce the sparse data problem in under-resourced languages.