no code implementations • Findings (ACL) 2022 • Clarissa Forbes, Farhan Samir, Bruce Oliver, Changbing Yang, Edith Coates, Garrett Nicolai, Miikka Silfverberg
With this paper, we make the case that IGT data can be leveraged successfully provided that target language expertise is available.
no code implementations • ComputEL (ACL) 2022 • Farhan Samir, Miikka Silfverberg
Data augmentation strategies are increasingly important in NLP pipelines for low-resourced and endangered languages, and in neural morphological inflection, augmentation by so called data hallucination is a popular technique.
1 code implementation • LREC 2022 • Bruce Oliver, Clarissa Forbes, Changbing Yang, Farhan Samir, Edith Coates, Garrett Nicolai, Miikka Silfverberg
We use Gitksan data in interlinear glossed format, stemming from language documentation efforts, to build a database of partial inflection tables.
1 code implementation • 14 Nov 2023 • Jian Zhu, Changbing Yang, Farhan Samir, Jahurul Islam
In this project, we demonstrate that phoneme-based models for speech processing can achieve strong crosslinguistic generalizability to unseen languages.
1 code implementation • 23 May 2023 • Farhan Samir, Miikka Silfverberg
In this study, we aim to shed light on the theoretical aspects of the prominent data augmentation strategy StemCorrupt (Silfverberg et al., 2017; Anastasopoulos and Neubig, 2019), a method that generates synthetic examples by randomly substituting stem characters in gold standard training examples.
no code implementations • 17 Mar 2022 • Clarissa Forbes, Farhan Samir, Bruce Harold Oliver, Changbing Yang, Edith Coates, Garrett Nicolai, Miikka Silfverberg
With this paper, we make the case that IGT data can be leveraged successfully provided that target language expertise is available.
1 code implementation • 12 Oct 2021 • David Francis, Ella Rabinovich, Farhan Samir, David Mortensen, Suzanne Stevenson
Specifically, we propose a variety of psycholinguistic factors -- semantic, distributional, and phonological -- that we hypothesize are predictive of lexical decline, in which words greatly decrease in frequency over time.