Search Results for author: Farhan Samir

Found 8 papers, 4 papers with code

One Wug, Two Wug+s Transformer Inflection Models Hallucinate Affixes

no code implementations ComputEL (ACL) 2022 Farhan Samir, Miikka Silfverberg

Data augmentation strategies are increasingly important in NLP pipelines for low-resourced and endangered languages, and in neural morphological inflection, augmentation by so called data hallucination is a popular technique.

Data Augmentation Hallucination +3

An Inflectional Database for Gitksan

1 code implementation LREC 2022 Bruce Oliver, Clarissa Forbes, Changbing Yang, Farhan Samir, Edith Coates, Garrett Nicolai, Miikka Silfverberg

We use Gitksan data in interlinear glossed format, stemming from language documentation efforts, to build a database of partial inflection tables.

Data Augmentation Hallucination +1

The taste of IPA: Towards open-vocabulary keyword spotting and forced alignment in any language

1 code implementation14 Nov 2023 Jian Zhu, Changbing Yang, Farhan Samir, Jahurul Islam

In this project, we demonstrate that phoneme-based models for speech processing can achieve strong crosslinguistic generalizability to unseen languages.

Keyword Spotting

Understanding Compositional Data Augmentation in Typologically Diverse Morphological Inflection

1 code implementation23 May 2023 Farhan Samir, Miikka Silfverberg

In this study, we aim to shed light on the theoretical aspects of the prominent data augmentation strategy StemCorrupt (Silfverberg et al., 2017; Anastasopoulos and Neubig, 2019), a method that generates synthetic examples by randomly substituting stem characters in gold standard training examples.

Attribute Data Augmentation +1

Dim Wihl Gat Tun: The Case for Linguistic Expertise in NLP for Underdocumented Languages

no code implementations17 Mar 2022 Clarissa Forbes, Farhan Samir, Bruce Harold Oliver, Changbing Yang, Edith Coates, Garrett Nicolai, Miikka Silfverberg

With this paper, we make the case that IGT data can be leveraged successfully provided that target language expertise is available.

Quantifying Cognitive Factors in Lexical Decline

1 code implementation12 Oct 2021 David Francis, Ella Rabinovich, Farhan Samir, David Mortensen, Suzanne Stevenson

Specifically, we propose a variety of psycholinguistic factors -- semantic, distributional, and phonological -- that we hypothesize are predictive of lexical decline, in which words greatly decrease in frequency over time.

Cannot find the paper you are looking for? You can Submit a new open access paper.