no code implementations • LREC 2022 • Vésteinn Snæbjarnarson, Haukur Barri Símonarson, Pétur Orri Ragnarsson, Svanhvít Lilja Ingólfsdóttir, Haukur Jónsson, Vilhjalmur THorsteinsson, Hafsteinn Einarsson
To train the models we introduce a new corpus of Icelandic text, the Icelandic Common Crawl Corpus (IC3), a collection of high quality texts found online by targeting the Icelandic top-level-domain . is.
no code implementations • WMT (EMNLP) 2021 • Haukur Barri Símonarson, Vésteinn Snæbjarnarson, Pétur Orri Ragnarson, Haukur Jónsson, Vilhjalmur THorsteinsson
We present Miðeind’s submission for the English→Icelandic and Icelandic→English subsets of the 2021 WMT news translation task.
1 code implementation • 29 May 2023 • Svanhvít Lilja Ingólfsdóttir, Pétur Orri Ragnarsson, Haukur Páll Jónsson, Haukur Barri Símonarson, Vilhjálmur Þorsteinsson, Vésteinn Snæbjarnarson
We show that a byte-level model enables higher correction quality than a subword approach, not only for simple spelling errors, but also for more complex semantic, stylistic and grammatical issues.
no code implementations • 14 Jan 2022 • Vésteinn Snæbjarnarson, Haukur Barri Símonarson, Pétur Orri Ragnarsson, Svanhvít Lilja Ingólfsdóttir, Haukur Páll Jónsson, Vilhjálmur Þorsteinsson, Hafsteinn Einarsson
To train the models we introduce a new corpus of Icelandic text, the Icelandic Common Crawl Corpus (IC3), a collection of high quality texts found online by targeting the Icelandic top-level-domain (TLD).
no code implementations • 15 Sep 2021 • Haukur Barri Símonarson, Vésteinn Snæbjarnarson, Pétur Orri Ragnarsson, Haukur Páll Jónsson, Vilhjálmur Þorsteinsson
We present Mi{\dh}eind's submission for the English$\to$Icelandic and Icelandic$\to$English subsets of the 2021 WMT news translation task.
no code implementations • 11 Aug 2021 • Haukur Barri Símonarson, Vésteinn Snæbjarnarson
We present a new Icelandic-English parallel corpus, the Icelandic Parallel Abstracts Corpus (IPAC), composed of abstracts from student theses and dissertations.