Search Results for author: Svanhvít Lilja Ingólfsdóttir

Found 5 papers, 1 papers with code

Towards High Accuracy Named Entity Recognition for Icelandic

no code implementations WS (NoDaLiDa) 2019 Svanhvít Lilja Ingólfsdóttir, Sigurjón Þorsteinsson, Hrafn Loftsson

We report on work in progress which consists of annotating an Icelandic corpus for named entities (NEs) and using it for training a named entity recognizer based on a Bidirectional Long Short-Term Memory model.

Miscellaneous named-entity-recognition +5

Byte-Level Grammatical Error Correction Using Synthetic and Curated Corpora

1 code implementation29 May 2023 Svanhvít Lilja Ingólfsdóttir, Pétur Orri Ragnarsson, Haukur Páll Jónsson, Haukur Barri Símonarson, Vilhjálmur Þorsteinsson, Vésteinn Snæbjarnarson

We show that a byte-level model enables higher correction quality than a subword approach, not only for simple spelling errors, but also for more complex semantic, stylistic and grammatical issues.

Grammatical Error Correction

A Warm Start and a Clean Crawled Corpus -- A Recipe for Good Language Models

no code implementations14 Jan 2022 Vésteinn Snæbjarnarson, Haukur Barri Símonarson, Pétur Orri Ragnarsson, Svanhvít Lilja Ingólfsdóttir, Haukur Páll Jónsson, Vilhjálmur Þorsteinsson, Hafsteinn Einarsson

To train the models we introduce a new corpus of Icelandic text, the Icelandic Common Crawl Corpus (IC3), a collection of high quality texts found online by targeting the Icelandic top-level-domain (TLD).

Constituency Parsing Grammatical Error Detection +4

Nefnir: A high accuracy lemmatizer for Icelandic

no code implementations WS (NoDaLiDa) 2019 Svanhvít Lilja Ingólfsdóttir, Hrafn Loftsson, Jón Friðrik Daðason, Kristín Bjarnadóttir

Lemmatization, finding the basic morphological form of a word in a corpus, is an important step in many natural language processing tasks when working with morphologically rich languages.

Lemmatization POS +1

Cannot find the paper you are looking for? You can Submit a new open access paper.