Search Results for author: Jack Rueter

Found 32 papers, 13 papers with code

Processing M.A. Castrén’s Materials: Multilingual Historical Typed and Handwritten Manuscripts

no code implementations • NLP4DH (ICON) 2021 • Niko Partanen, Jack Rueter, Khalid Alnajjar, Mika Hämäläinen

The study forms a technical report of various tasks that have been performed on the materials collected and published by Finnish ethnographer and linguist, Matthias Alexander Castrén (1813–1852).

Paper
Add Code

Using Graph-Based Methods to Augment Online Dictionaries of Endangered Languages

no code implementations • ComputEL (ACL) 2022 • Khalid Alnajjar, Mika Hämäläinen, Niko Tapio Partanen, Jack Rueter

Many endangered Uralic languages have multilingual machine readable dictionaries saved in an XML format.

Paper
Add Code

Morphosyntactic Disambiguation in an Endangered Language Setting

no code implementations • WS (NoDaLiDa) 2019 • Jeff Ens, Mika Hämäläinen, Jack Rueter, Philippe Pasquier

Endangered Uralic languages present a high variety of inflectional forms in their morphology.

Sentence

Paper
Add Code

Overview of Open-Source Morphology Development for the Komi-Zyrian Language: Past and future

1 code implementation • ACL (IWCLUL) 2021 • Jack Rueter, Niko Partanen, Mika Hämäläinen, Trond Trosterud

Paper
Code

Numerals and what counts

no code implementations • UDW (SyntaxFest) 2021 • Jack Rueter, Niko Partanen, Flammie A. Pirinen

Paper
Add Code

Linguistic change and historical periodization of Old Literary Finnish

no code implementations • ACL (LChange) 2021 • Niko Partanen, Khalid Alnajjar, Mika Hämäläinen, Jack Rueter

In this study, we have normalized and lemmatized an Old Literary Finnish corpus using a lemmatization model trained on texts from Agricola.

Lemmatization Word Embeddings

Paper
Add Code

Sentiment Analysis Using Aligned Word Embeddings for Uralic Languages

no code implementations • 24 May 2023 • Khalid Alnajjar, Mika Hämäläinen, Jack Rueter

Furthermore, we align these word embeddings and present a novel neural network model that is trained on English data to conduct sentiment analysis and then applied on endangered language data through the aligned word embeddings.

Sentiment Analysis Word Embeddings

Paper
Add Code

Processing M.A. Castrén's Materials: Multilingual Typed and Handwritten Manuscripts

no code implementations • 28 Dec 2021 • Niko Partanen, Jack Rueter, Mika Hämäläinen, Khalid Alnajjar

The study forms a technical report of various tasks that have been performed on the materials collected and published by Finnish ethnographer and linguist, Matthias Alexander Castr\'en (1813-1852).

Paper
Add Code

Detecting Depression in Thai Blog Posts: a Dataset and a Baseline

no code implementations • WNUT (ACL) 2021 • Mika Hämäläinen, Pattama Patpong, Khalid Alnajjar, Niko Partanen, Jack Rueter

We present the first openly available corpus for detecting depression in Thai.

Paper
Add Code

Finnish Dialect Identification: The Effect of Audio and Text

1 code implementation • EMNLP 2021 • Mika Hämäläinen, Khalid Alnajjar, Niko Partanen, Jack Rueter

Finnish is a language with multiple dialects that not only differ from each other in terms of accent (pronunciation) but also in terms of morphological forms and lexical choice.

Dialect Identification

Paper
Code

Never guess what I heard... Rumor Detection in Finnish News: a Dataset and a Baseline

no code implementations • NAACL (NLP4IF) 2021 • Mika Hämäläinen, Khalid Alnajjar, Niko Partanen, Jack Rueter

However, a model fine-tuned on Multilingual BERT reaches the best factual label accuracy of 97. 2%.

Paper
Add Code

Apurinã Universal Dependencies Treebank

no code implementations • NAACL (AmericasNLP) 2021 • Jack Rueter, Marília Fernanda Pereira de Freitas, Sidney da Silva Facundes, Mika Hämäläinen, Niko Partanen

The construction of the treebank has also served as an opportunity to develop finite-state description of the language and facilitate the transfer of open-source infrastructure possibilities to an endangered language of the Amazon.

Paper
Add Code

Neural Morphology Dataset and Models for Multiple Languages, from the Large to the Endangered

1 code implementation • NoDaLiDa 2021 • Mika Hämäläinen, Niko Partanen, Jack Rueter, Khalid Alnajjar

We train neural models for morphological analysis, generation and lemmatization for morphologically rich languages.

Lemmatization Morphological Analysis

Paper
Code

Ve'rdd. Narrowing the Gap between Paper Dictionaries, Low-Resource NLP and Community Involvement

1 code implementation • COLING 2020 • Khalid Alnajjar, Mika Hämäläinen, Jack Rueter, Niko Partanen

We present an open-source online dictionary editing system, Ve'rdd, that offers a chance to re-evaluate and edit grassroots dictionaries that have been exposed to multiple amateur editors.

Paper
Code

Open-Source Morphology for Endangered Mordvinic Languages

2 code implementations • 11 Nov 2020 • Jack Rueter, Mika Hämäläinen, Niko Partanen

This document describes shared development of finite-state description of two closely related but endangered minority languages, Erzya and Moksha.

Unity

Paper
Code

Automated Prediction of Medieval Arabic Diacritics

1 code implementation • 11 Oct 2020 • Khalid Alnajjar, Mika Hämäläinen, Niko Partanen, Jack Rueter

This study uses a character level neural machine translation approach trained on a long short-term memory-based bi-directional recurrent neural network architecture for diacritization of Medieval Arabic.

Machine Translation Translation

Paper
Code

On Editing Dictionaries for Uralic Languages in an Online Environment

no code implementations • WS 2020 • Khalid Alnajjar, Mika H{\"a}m{\"a}l{\"a}inen, Jack Rueter

Paper
Add Code

On the questions in developing computational infrastructure for Komi-Permyak

1 code implementation • WS 2020 • Jack Rueter, Niko Partanen, Larisa Ponomareva

Paper
Code

Automatic Dialect Adaptation in Finnish and its Effect on Perceived Creativity

1 code implementation • 6 Sep 2020 • Mika Hämäläinen, Niko Partanen, Khalid Alnajjar, Jack Rueter, Thierry Poibeau

The models are tested with over 20 different dialects.

NMT Transfer Learning

Paper
Code

FST Morphology for the Endangered Skolt Sami Language

1 code implementation • LREC 2020 • Jack Rueter, Mika Hämäläinen

We present advances in the development of a FST-based morphological analyzer and generator for Skolt Sami.

Morphological Analysis

Paper
Code

An Open Online Dictionary for Endangered Uralic Languages

1 code implementation • The sixth biennial conference on electronic lexicography, eLex 2019 2019 • Mika Hämäläinen, Jack Rueter

This makes it possible to integrate the system with the existing open-source Giellatekno infrastructure that provides and utilizes XML formatted dictionaries for use in a variety of NLP tasks.

Paper
Code

Survey of Uralic Universal Dependencies development

no code implementations • WS 2019 • Niko Partanen, Jack Rueter

Paper
Add Code

Revisiting NMT for Normalization of Early English Letters

1 code implementation • WS 2019 • Mika H{\"a}m{\"a}l{\"a}inen, Tanja S{\"a}ily, Jack Rueter, J{\"o}rg Tiedemann, Eetu M{\"a}kel{\"a}

This paper studies the use of NMT (neural machine translation) as a normalization method for an early English letter corpus.

Lemmatization Machine Translation +2

Paper
Code

Finding Sami Cognates with a Character-Based NMT Approach

no code implementations • WS 2019 • Mika H{\"a}m{\"a}l{\"a}inen, Jack Rueter

NMT

Paper
Add Code

Normalizing Early English Letters to Present-day English Spelling

no code implementations • COLING 2018 • Mika H{\"a}m{\"a}l{\"a}inen, Tanja S{\"a}ily, Jack Rueter, J{\"o}rg Tiedemann, Eetu M{\"a}kel{\"a}

This paper presents multiple methods for normalizing the most deviant and infrequent historical spellings in a corpus consisting of personal correspondence from the 15th to the 19th century.

Machine Translation Translation