Search Results for author: Jack Rueter

Found 32 papers, 13 papers with code

Processing M.A. Castrén’s Materials: Multilingual Historical Typed and Handwritten Manuscripts

no code implementations NLP4DH (ICON) 2021 Niko Partanen, Jack Rueter, Khalid Alnajjar, Mika Hämäläinen

The study forms a technical report of various tasks that have been performed on the materials collected and published by Finnish ethnographer and linguist, Matthias Alexander Castrén (1813–1852).

Linguistic change and historical periodization of Old Literary Finnish

no code implementations ACL (LChange) 2021 Niko Partanen, Khalid Alnajjar, Mika Hämäläinen, Jack Rueter

In this study, we have normalized and lemmatized an Old Literary Finnish corpus using a lemmatization model trained on texts from Agricola.

Lemmatization Word Embeddings

Sentiment Analysis Using Aligned Word Embeddings for Uralic Languages

no code implementations24 May 2023 Khalid Alnajjar, Mika Hämäläinen, Jack Rueter

Furthermore, we align these word embeddings and present a novel neural network model that is trained on English data to conduct sentiment analysis and then applied on endangered language data through the aligned word embeddings.

Sentiment Analysis Word Embeddings

Processing M.A. Castrén's Materials: Multilingual Typed and Handwritten Manuscripts

no code implementations28 Dec 2021 Niko Partanen, Jack Rueter, Mika Hämäläinen, Khalid Alnajjar

The study forms a technical report of various tasks that have been performed on the materials collected and published by Finnish ethnographer and linguist, Matthias Alexander Castr\'en (1813-1852).

Finnish Dialect Identification: The Effect of Audio and Text

1 code implementation EMNLP 2021 Mika Hämäläinen, Khalid Alnajjar, Niko Partanen, Jack Rueter

Finnish is a language with multiple dialects that not only differ from each other in terms of accent (pronunciation) but also in terms of morphological forms and lexical choice.

Dialect Identification

Apurinã Universal Dependencies Treebank

no code implementations NAACL (AmericasNLP) 2021 Jack Rueter, Marília Fernanda Pereira de Freitas, Sidney da Silva Facundes, Mika Hämäläinen, Niko Partanen

The construction of the treebank has also served as an opportunity to develop finite-state description of the language and facilitate the transfer of open-source infrastructure possibilities to an endangered language of the Amazon.

Ve'rdd. Narrowing the Gap between Paper Dictionaries, Low-Resource NLP and Community Involvement

1 code implementation COLING 2020 Khalid Alnajjar, Mika Hämäläinen, Jack Rueter, Niko Partanen

We present an open-source online dictionary editing system, Ve'rdd, that offers a chance to re-evaluate and edit grassroots dictionaries that have been exposed to multiple amateur editors.

Open-Source Morphology for Endangered Mordvinic Languages

2 code implementations11 Nov 2020 Jack Rueter, Mika Hämäläinen, Niko Partanen

This document describes shared development of finite-state description of two closely related but endangered minority languages, Erzya and Moksha.

Unity

Automated Prediction of Medieval Arabic Diacritics

1 code implementation11 Oct 2020 Khalid Alnajjar, Mika Hämäläinen, Niko Partanen, Jack Rueter

This study uses a character level neural machine translation approach trained on a long short-term memory-based bi-directional recurrent neural network architecture for diacritization of Medieval Arabic.

Machine Translation Translation

FST Morphology for the Endangered Skolt Sami Language

1 code implementation LREC 2020 Jack Rueter, Mika Hämäläinen

We present advances in the development of a FST-based morphological analyzer and generator for Skolt Sami.

Morphological Analysis

An Open Online Dictionary for Endangered Uralic Languages

1 code implementation The sixth biennial conference on electronic lexicography, eLex 2019 2019 Mika Hämäläinen, Jack Rueter

This makes it possible to integrate the system with the existing open-source Giellatekno infrastructure that provides and utilizes XML formatted dictionaries for use in a variety of NLP tasks.

Normalizing Early English Letters to Present-day English Spelling

no code implementations COLING 2018 Mika H{\"a}m{\"a}l{\"a}inen, Tanja S{\"a}ily, Jack Rueter, J{\"o}rg Tiedemann, Eetu M{\"a}kel{\"a}

This paper presents multiple methods for normalizing the most deviant and infrequent historical spellings in a corpus consisting of personal correspondence from the 15th to the 19th century.

Machine Translation Translation

Cannot find the paper you are looking for? You can Submit a new open access paper.