no code implementations • UDW (COLING) 2020 • Þórunn Arnardóttir, Hinrik Hafsteinsson, Einar Freyr Sigurðsson, Kristín Bjarnadóttir, Anton Karl Ingason, Hildur Jónsdóttir, Steinþór Steingrímsson
The topic of this paper is a rule-based pipeline for converting constituency treebanks based on the Penn Treebank format to Universal Dependencies (UD).
no code implementations • RANLP (BUCC) 2021 • Steinþór Steingrímsson, Pintu Lohar, Hrafn Loftsson, Andy Way
Parallel sentences extracted from comparable corpora can be useful to supplement parallel corpora when training machine translation (MT) systems.
no code implementations • LREC 2022 • Steinunn Rut Friðriksdóttir, Hjalti Daníelsson, Steinþór Steingrímsson, Einar Sigurdsson
Word embedding models have become commonplace in a wide range of NLP applications.
no code implementations • WS (NoDaLiDa) 2019 • Starkaður Barkarson, Steinþór Steingrímsson
We estimate that approximately 5% of the corpus data is noise or faulty alignments while more than 50% of the segments we deleted were faulty.
no code implementations • WS (NoDaLiDa) 2019 • Kristín Bjarnadóttir, Kristín Ingibjörg Hlynsdóttir, Steinþór Steingrímsson
The topic of this paper is The Database of Icelandic Morphology (DIM), a multipurpose linguistic resource, created for use in language technology, as a reference for the general public in Iceland, and for use in research on the Icelandic language.
no code implementations • LREC 2022 • Starkaður Barkarson, Steinþór Steingrímsson, Hildur Hafsteinsdóttir
We show how the corpus has grown almost 50% in size from the first version to the fourth and how it was restructured in order to better accommodate different meta-data for different subcorpora.
1 code implementation • NoDaLiDa 2021 • Steinþór Steingrímsson, Hrafn Loftsson, Andy Way
Being able to generate accurate word alignments is useful for a variety of tasks.
no code implementations • NoDaLiDa 2021 • Hjalti Daníelsson, Jón Hilmar Jónsson, Þórður Arnar Árnason, Alec Shaw, Einar Freyr Sigurðsson, Steinþór Steingrímsson
The new Icelandic Word Web (IW) is a language technology focused redesign of a lexicosemantic database of semantically related entries.
no code implementations • gwll (LREC) 2022 • Steinþór Steingrímsson, Luke O’Brien, Finnur Ingimundarson, Hrafn Loftsson, Andy Way
By combining the most promising approaches and data sets, using confidence scores calculated from the data and the results of manually evaluating samples from our manual evaluation as indicators, we are able to induce lists of translations with a very high acceptance rate.
1 code implementation • 15 Nov 2023 • Steinþór Steingrímsson, Hrafn Loftsson, Andy Way
We present SentAlign, an accurate sentence alignment tool designed to handle very large parallel document pairs.
1 code implementation • LREC 2020 • Anna Björk Nikulásdóttir, Jón Guðnason, Anton Karl Ingason, Hrafn Loftsson, Eiríkur Rögnvaldsson, Einar Freyr Sigurðsson, Steinþór Steingrímsson
In this paper, we describe a new national language technology programme for Icelandic.
1 code implementation • RANLP 2019 • Steinþór Steingrímsson, Örvar Kárason, Hrafn Loftsson
Previous work on using BiLSTM models for PoS tagging has primarily focused on small tagsets.