no code implementations • WMT (EMNLP) 2021 • Haukur Barri Símonarson, Vésteinn Snæbjarnarson, Pétur Orri Ragnarson, Haukur Jónsson, Vilhjalmur THorsteinsson
We present Miðeind’s submission for the English→Icelandic and Icelandic→English subsets of the 2021 WMT news translation task.
no code implementations • LREC 2022 • Vésteinn Snæbjarnarson, Haukur Barri Símonarson, Pétur Orri Ragnarsson, Svanhvít Lilja Ingólfsdóttir, Haukur Jónsson, Vilhjalmur THorsteinsson, Hafsteinn Einarsson
To train the models we introduce a new corpus of Icelandic text, the Icelandic Common Crawl Corpus (IC3), a collection of high quality texts found online by targeting the Icelandic top-level-domain . is.
no code implementations • LREC 2022 • Vésteinn Snæbjarnarson, Hafsteinn Einarsson
The dataset is a valuable resource for Icelandic which we demonstrate by creating and evaluating a system capable of extractive QA in Icelandic.
no code implementations • 6 Apr 2024 • Kevin Du, Vésteinn Snæbjarnarson, Niklas Stoehr, Jennifer C. White, Aaron Schein, Ryan Cotterell
To answer a question, language models often need to integrate prior knowledge learned during pretraining and new information presented in context.
1 code implementation • 29 May 2023 • Svanhvít Lilja Ingólfsdóttir, Pétur Orri Ragnarsson, Haukur Páll Jónsson, Haukur Barri Símonarson, Vilhjálmur Þorsteinsson, Vésteinn Snæbjarnarson
We show that a byte-level model enables higher correction quality than a subword approach, not only for simple spelling errors, but also for more complex semantic, stylistic and grammatical issues.
no code implementations • 18 Apr 2023 • Vésteinn Snæbjarnarson, Annika Simonsen, Goran Glavaš, Ivan Vulić
Multilingual language models have pushed state-of-the-art in cross-lingual NLP transfer.
1 code implementation • ICCV 2023 • Idan Schwartz, Vésteinn Snæbjarnarson, Hila Chefer, Ryan Cotterell, Serge Belongie, Lior Wolf, Sagie Benaim
This approach has two disadvantages: (i) supervised datasets are generally small compared to large-scale scraped text-image datasets on which text-to-image models are trained, affecting the quality and diversity of the generated images, or (ii) the input is a hard-coded label, as opposed to free-form text, limiting the control over the generated images.
no code implementations • 17 Nov 2022 • Peter Ebert Christensen, Vésteinn Snæbjarnarson, Andrea Dittadi, Serge Belongie, Sagie Benaim
We demonstrate that APT is capable of a wide range of class-preserving semantic image manipulations that fool a variety of pretrained classifiers.
no code implementations • NAACL (MIA) 2022 • Vésteinn Snæbjarnarson, Hafsteinn Einarsson
Our approach requires only limited QA resources in the given language, along with machine-translated data, and at least a bilingual language model.
no code implementations • 14 Jan 2022 • Vésteinn Snæbjarnarson, Haukur Barri Símonarson, Pétur Orri Ragnarsson, Svanhvít Lilja Ingólfsdóttir, Haukur Páll Jónsson, Vilhjálmur Þorsteinsson, Hafsteinn Einarsson
To train the models we introduce a new corpus of Icelandic text, the Icelandic Common Crawl Corpus (IC3), a collection of high quality texts found online by targeting the Icelandic top-level-domain (TLD).
no code implementations • 15 Sep 2021 • Haukur Barri Símonarson, Vésteinn Snæbjarnarson, Pétur Orri Ragnarsson, Haukur Páll Jónsson, Vilhjálmur Þorsteinsson
We present Mi{\dh}eind's submission for the English$\to$Icelandic and Icelandic$\to$English subsets of the 2021 WMT news translation task.
no code implementations • 11 Aug 2021 • Haukur Barri Símonarson, Vésteinn Snæbjarnarson
We present a new Icelandic-English parallel corpus, the Icelandic Parallel Abstracts Corpus (IPAC), composed of abstracts from student theses and dissertations.