Search Results for author: Farhan Samir

Found 8 papers, 4 papers with code

A Formidable Ability: Detecting Adjectival Extremeness with DSMs

no code implementations • Findings (ACL) 2021 • Farhan Samir, Barend Beekhuizen, Suzanne Stevenson

Paper
Add Code

Dim Wihl Gat Tun: The Case for Linguistic Expertise in NLP for Under-Documented Languages

no code implementations • Findings (ACL) 2022 • Clarissa Forbes, Farhan Samir, Bruce Oliver, Changbing Yang, Edith Coates, Garrett Nicolai, Miikka Silfverberg

With this paper, we make the case that IGT data can be leveraged successfully provided that target language expertise is available.

Paper
Add Code

One Wug, Two Wug+s Transformer Inflection Models Hallucinate Affixes

no code implementations • ComputEL (ACL) 2022 • Farhan Samir, Miikka Silfverberg

Data augmentation strategies are increasingly important in NLP pipelines for low-resourced and endangered languages, and in neural morphological inflection, augmentation by so called data hallucination is a popular technique.

Data Augmentation Hallucination +3

Paper
Add Code

An Inflectional Database for Gitksan

1 code implementation • LREC 2022 • Bruce Oliver, Clarissa Forbes, Changbing Yang, Farhan Samir, Edith Coates, Garrett Nicolai, Miikka Silfverberg

We use Gitksan data in interlinear glossed format, stemming from language documentation efforts, to build a database of partial inflection tables.

Data Augmentation Hallucination +1

Paper
Code

The taste of IPA: Towards open-vocabulary keyword spotting and forced alignment in any language

1 code implementation • 14 Nov 2023 • Jian Zhu, Changbing Yang, Farhan Samir, Jahurul Islam

In this project, we demonstrate that phoneme-based models for speech processing can achieve strong crosslinguistic generalizability to unseen languages.

Keyword Spotting

Paper
Code

Understanding Compositional Data Augmentation in Typologically Diverse Morphological Inflection

1 code implementation • 23 May 2023 • Farhan Samir, Miikka Silfverberg

In this study, we aim to shed light on the theoretical aspects of the prominent data augmentation strategy StemCorrupt (Silfverberg et al., 2017; Anastasopoulos and Neubig, 2019), a method that generates synthetic examples by randomly substituting stem characters in gold standard training examples.

Attribute Data Augmentation +1

Paper
Code

Dim Wihl Gat Tun: The Case for Linguistic Expertise in NLP for Underdocumented Languages

no code implementations • 17 Mar 2022 • Clarissa Forbes, Farhan Samir, Bruce Harold Oliver, Changbing Yang, Edith Coates, Garrett Nicolai, Miikka Silfverberg

With this paper, we make the case that IGT data can be leveraged successfully provided that target language expertise is available.

Paper
Add Code

Quantifying Cognitive Factors in Lexical Decline

1 code implementation • 12 Oct 2021 • David Francis, Ella Rabinovich, Farhan Samir, David Mortensen, Suzanne Stevenson

Specifically, we propose a variety of psycholinguistic factors -- semantic, distributional, and phonological -- that we hypothesize are predictive of lexical decline, in which words greatly decrease in frequency over time.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.