no code implementations • EMNLP (CMCL) 2020 • Kate McCurdy, Adam Lopez, Sharon Goldwater
Grammatical gender is a consistent and informative cue to the plural class of German nouns.
no code implementations • ACL (SIGMORPHON) 2021 • Kate McCurdy, Sharon Goldwater, Adam Lopez
This work describes the Edinburgh submission to the SIGMORPHON 2021 Shared Task 2 on unsupervised morphological paradigm clustering.
1 code implementation • 20 Oct 2023 • Amr Keleg, Sharon Goldwater, Walid Magdy
Transcribed speech and user-generated text in Arabic typically contain a mixture of Modern Standard Arabic (MSA), the standardized language taught in schools, and Dialectal Arabic (DA), used in daily communications.
no code implementations • 3 Jun 2023 • Ramon Sanabria, Ondrej Klejch, Hao Tang, Sharon Goldwater
Acoustic word embeddings are typically created by training a pooling function using pairs of word-like units.
no code implementations • 21 May 2023 • Oli Liu, Hao Tang, Sharon Goldwater
Self-supervised speech representations are known to encode both speaker and phonetic information, but how they are distributed in the high-dimensional space remains largely unexplored.
1 code implementation • 23 Feb 2023 • Elizabeth Nielsen, Sharon Goldwater, Mark Steedman
Parsing spoken dialogue presents challenges that parsing text does not, including a lack of clear sentence boundaries.
no code implementations • 28 Oct 2022 • Ramon Sanabria, Hao Tang, Sharon Goldwater
Given the strong results of self-supervised models on various tasks, there have been surprisingly few studies exploring self-supervised representations for acoustic word embeddings (AWE), fixed-dimensional vectors representing variable-length spoken word segments.
2 code implementations • 22 Sep 2021 • Ida Szubert, Omri Abend, Nathan Schneider, Samuel Gibbon, Louis Mahon, Sharon Goldwater, Mark Steedman
We then demonstrate the utility of the compiled corpora through (1) a longitudinal corpus study of the prevalence of different syntactic and semantic phenomena in the CDS, and (2) applying an existing computational model of language acquisition to the two corpora and briefly comparing the results across languages.
no code implementations • EMNLP (insights) 2021 • Ramon Sanabria, Hao Tang, Sharon Goldwater
Word segmentation, the problem of finding word boundaries in speech, is of interest for a range of tasks.
no code implementations • ACL 2021 • Elizabeth Nielsen, Mark Steedman, Sharon Goldwater
We investigate how prosody affects a parser that receives an entire dialogue turn as input (a turn-based model), instead of gold standard pre-segmented SUs (an SU-based model).
no code implementations • 12 May 2021 • Alexander Robertson, Walid Magdy, Sharon Goldwater
Research in sociology and linguistics shows that people use language not only to express their own identity but to understand the identity of others.
no code implementations • 7 May 2021 • Alexander Robertson, Walid Magdy, Sharon Goldwater
Prior work has shown that Twitter users use skin-toned emoji as an act of self-representation to express their racial/ethnic identity.
no code implementations • EACL 2021 • Yevgen Matusevych, Herman Kamper, Thomas Schatz, Naomi H. Feldman, Sharon Goldwater
We then test the model on a spoken word processing task, showing that phonology may not be necessary to explain some of the word processing effects observed in non-native speakers.
no code implementations • 21 Oct 2020 • Aibek Makazhanov, Sharon Goldwater, Adam Lopez
We present LemMED, a character-level encoder-decoder for contextual morphological analysis (combined lemmatization and tagging).
no code implementations • 6 Aug 2020 • Yevgen Matusevych, Thomas Schatz, Herman Kamper, Naomi H. Feldman, Sharon Goldwater
In the first year of life, infants' speech perception becomes attuned to the sounds of their native language.
1 code implementation • 2 Jun 2020 • Herman Kamper, Yevgen Matusevych, Sharon Goldwater
We consider three multilingual recurrent neural network (RNN) models: a classifier trained on the joint vocabularies of all training languages; a Siamese RNN trained to discriminate between same and different words from multiple languages; and a correspondence autoencoder (CAE) RNN trained to reconstruct word pairs.
no code implementations • ACL 2020 • Kate McCurdy, Sharon Goldwater, Adam Lopez
Encoder-decoder models do generalize the most frequently produced plural class, but do not show human-like variability or 'regular' extension of these other plural markers.
no code implementations • EMNLP 2020 • Elizabeth Nielsen, Mark Steedman, Sharon Goldwater
We find that these innovations lead to an improvement from 87. 5% to 88. 7% accuracy on pitch accent detection on American English speech in the Boston University Radio News Corpus, a state-of-the-art result.
no code implementations • 3 Apr 2020 • Yevgen Matusevych, Herman Kamper, Sharon Goldwater
To better understand the applications of AWEs in various downstream tasks and in cognitive modeling, we need to analyze the representation spaces of AWEs.
1 code implementation • 6 Feb 2020 • Herman Kamper, Yevgen Matusevych, Sharon Goldwater
Acoustic word embeddings are fixed-dimensional representations of variable-length speech segments.
no code implementations • 23 Oct 2019 • Mihaela C. Stoian, Sameer Bansal, Sharon Goldwater
Previous work has shown that for low-resource source languages, automatic speech-to-text translation (AST) can be improved by pretraining an end-to-end model on automatic speech recognition (ASR) data from a high-resource language.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 29 Aug 2019 • Sameer Bansal, Herman Kamper, Adam Lopez, Sharon Goldwater
Given a large amount of unannotated speech in a low-resource language, can we classify the speech utterances by topic?
no code implementations • ACL 2019 • Maria Corkery, Yevgen Matusevych, Sharon Goldwater
The cognitive mechanisms needed to account for the English past tense have long been a subject of debate in linguistics and cognitive science.
1 code implementation • NAACL 2019 • Toms Bergmanis, Sharon Goldwater
Lemmatization aims to reduce the sparse data problem by relating the inflected forms of a word to its dictionary form.
1 code implementation • 2 Apr 2019 • Toms Bergmanis, Sharon Goldwater
Lemmatization aims to reduce the sparse data problem by relating the inflected forms of a word to its dictionary form.
1 code implementation • 9 Nov 2018 • Enno Hermann, Herman Kamper, Sharon Goldwater
Here we directly compare multiple methods, including some that use only target language speech data and some that use transcribed speech from other (non-target) languages, and we evaluate using two intrinsic measures as well as on a downstream unsupervised word segmentation and clustering task.
1 code implementation • WS 2018 • Philippa Shoemark, James Kirby, Sharon Goldwater
Sociolinguistics is often concerned with how variants of a linguistic item (e. g., \textit{nothing} vs. \textit{nothin{'}}) are used by different groups or in different situations.
1 code implementation • NAACL 2019 • Sameer Bansal, Herman Kamper, Karen Livescu, Adam Lopez, Sharon Goldwater
Finally, we show that the approach improves performance on a true low-resource task: pre-training on a combination of English ASR and French ASR improves Mboshi-French ST, where only 4 hours of data are available, from 3. 5 to 7. 1
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • NAACL 2018 • Toms Bergmanis, Sharon Goldwater
The main motivation for developing contextsensitive lemmatizers is to improve performance on unseen and ambiguous words.
no code implementations • NAACL 2018 • Alexander Robertson, Sharon Goldwater
We highlight several issues in the evaluation of historical text normalization systems that make it hard to tell how well these systems would actually work in practice---i. e., for new datasets or languages; in comparison to more na\"ive systems; or as a preprocessing step for downstream NLP tools.
no code implementations • 24 Mar 2018 • Sameer Bansal, Herman Kamper, Karen Livescu, Adam Lopez, Sharon Goldwater
We explore models trained on between 20 and 160 hours of data, and find that although models trained on less data have considerably lower BLEU scores, they can still predict words with relatively high precision and recall---around 50% for a model trained on 50 hours of data, versus around 60% for the full 160 hour model.
1 code implementation • 23 Mar 2018 • Enno Hermann, Sharon Goldwater
How can we effectively develop speech technology for languages where no transcribed data is available?
no code implementations • WS 2017 • Antonios Anastasopoulos, Sameer Bansal, David Chiang, Sharon Goldwater, Adam Lopez
Vast amounts of speech data collected for language documentation and research remain untranscribed and unsearchable, but often a small amount of speech may have text translations available.
no code implementations • WS 2017 • Philippa Shoemark, James Kirby, Sharon Goldwater
Sociolinguistic research suggests that speakers modulate their language style in response to their audience.
no code implementations • EACL 2017 • Toms Bergmanis, Sharon Goldwater
A major motivation for unsupervised morphological analysis is to reduce the sparse data problem in under-resourced languages.
no code implementations • EACL 2017 • Philippa Shoemark, Debnil Sur, Luke Shrimpton, Iain Murray, Sharon Goldwater
Political surveys have indicated a relationship between a sense of Scottish identity and voting decisions in the 2014 Scottish Independence Referendum.
2 code implementations • 23 Mar 2017 • Herman Kamper, Karen Livescu, Sharon Goldwater
Unsupervised segmentation and clustering of unlabelled speech are core problems in zero-resource speech processing.
no code implementations • EACL 2017 • Sameer Bansal, Herman Kamper, Adam Lopez, Sharon Goldwater
We explore the problem of translating speech to text in low-resource scenarios where neither automatic speech recognition (ASR) nor machine translation (MT) are available, but we have training data in the form of audio paired with text translations.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 21 Sep 2016 • Sameer Bansal, Herman Kamper, Sharon Goldwater, Adam Lopez
Recent work on unsupervised term discovery (UTD) aims to identify and cluster repeated word-like units from audio alone.
5 code implementations • 22 Jun 2016 • Herman Kamper, Aren Jansen, Sharon Goldwater
We also show that the discovered clusters can be made less speaker- and gender-specific by using an unsupervised autoencoder-like feature extractor to learn better frame-level features (prior to embedding).
no code implementations • 9 Mar 2016 • Herman Kamper, Aren Jansen, Sharon Goldwater
In settings where only unlabelled speech data is available, speech technology needs to be developed without transcriptions, pronunciation dictionaries, or language modelling text.
no code implementations • TACL 2013 • John K Pate, Sharon Goldwater
Unsupervised parsing is a difficult task that infants readily perform.
no code implementations • TACL 2013 • Kairit Sirts, Sharon Goldwater
This paper explores the use of Adaptor Grammars, a nonparametric Bayesian modelling framework, for minimally supervised morphological segmentation.