Search Results for author: Svanhvít Lilja Ingólfsdóttir

Found 5 papers, 1 papers with code

Towards High Accuracy Named Entity Recognition for Icelandic

no code implementations • WS (NoDaLiDa) 2019 • Svanhvít Lilja Ingólfsdóttir, Sigurjón Þorsteinsson, Hrafn Loftsson

We report on work in progress which consists of annotating an Icelandic corpus for named entities (NEs) and using it for training a named entity recognizer based on a Bidirectional Long Short-Term Memory model.

Miscellaneous named-entity-recognition +5

Paper
Add Code

A Warm Start and a Clean Crawled Corpus - A Recipe for Good Language Models

no code implementations • LREC 2022 • Vésteinn Snæbjarnarson, Haukur Barri Símonarson, Pétur Orri Ragnarsson, Svanhvít Lilja Ingólfsdóttir, Haukur Jónsson, Vilhjalmur THorsteinsson, Hafsteinn Einarsson

To train the models we introduce a new corpus of Icelandic text, the Icelandic Common Crawl Corpus (IC3), a collection of high quality texts found online by targeting the Icelandic top-level-domain . is.

Constituency Parsing Grammatical Error Detection +4

Paper
Add Code

Byte-Level Grammatical Error Correction Using Synthetic and Curated Corpora

1 code implementation • 29 May 2023 • Svanhvít Lilja Ingólfsdóttir, Pétur Orri Ragnarsson, Haukur Páll Jónsson, Haukur Barri Símonarson, Vilhjálmur Þorsteinsson, Vésteinn Snæbjarnarson

We show that a byte-level model enables higher correction quality than a subword approach, not only for simple spelling errors, but also for more complex semantic, stylistic and grammatical issues.

Grammatical Error Correction

Paper
Code

A Warm Start and a Clean Crawled Corpus -- A Recipe for Good Language Models

no code implementations • 14 Jan 2022 • Vésteinn Snæbjarnarson, Haukur Barri Símonarson, Pétur Orri Ragnarsson, Svanhvít Lilja Ingólfsdóttir, Haukur Páll Jónsson, Vilhjálmur Þorsteinsson, Hafsteinn Einarsson

Constituency Parsing Grammatical Error Detection +4

Paper
Add Code

Nefnir: A high accuracy lemmatizer for Icelandic

no code implementations • WS (NoDaLiDa) 2019 • Svanhvít Lilja Ingólfsdóttir, Hrafn Loftsson, Jón Friðrik Daðason, Kristín Bjarnadóttir

Lemmatization, finding the basic morphological form of a word in a corpus, is an important step in many natural language processing tasks when working with morphologically rich languages.

Lemmatization POS +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.