no code implementations • LREC 2022 • Hulda Óladóttir, Þórunn Arnardóttir, Anton Ingason, Vilhjálmur Þorsteinsson
A lack of datasets for spelling and grammatical error correction in Icelandic, along with language-specific issues, has caused a dearth of spell and grammar checking systems for the language.
1 code implementation • 29 May 2023 • Svanhvít Lilja Ingólfsdóttir, Pétur Orri Ragnarsson, Haukur Páll Jónsson, Haukur Barri Símonarson, Vilhjálmur Þorsteinsson, Vésteinn Snæbjarnarson
We show that a byte-level model enables higher correction quality than a subword approach, not only for simple spelling errors, but also for more complex semantic, stylistic and grammatical issues.
no code implementations • 14 Jan 2022 • Vésteinn Snæbjarnarson, Haukur Barri Símonarson, Pétur Orri Ragnarsson, Svanhvít Lilja Ingólfsdóttir, Haukur Páll Jónsson, Vilhjálmur Þorsteinsson, Hafsteinn Einarsson
To train the models we introduce a new corpus of Icelandic text, the Icelandic Common Crawl Corpus (IC3), a collection of high quality texts found online by targeting the Icelandic top-level-domain (TLD).
no code implementations • 15 Sep 2021 • Haukur Barri Símonarson, Vésteinn Snæbjarnarson, Pétur Orri Ragnarsson, Haukur Páll Jónsson, Vilhjálmur Þorsteinsson
We present Mi{\dh}eind's submission for the English$\to$Icelandic and Icelandic$\to$English subsets of the 2021 WMT news translation task.