no code implementations • ACL (SIGMORPHON) 2021 • Lucas F.E. Ashby, Travis M. Bartley, Simon Clematide, Luca Del Signore, Cameron Gibson, Kyle Gorman, Yeonju Lee-Sikka, Peter Makarov, Aidan Malanoski, Sean Miller, Omar Ortiz, Reuben Raff, Arundhati Sengupta, Bora Seo, Yulia Spektor, Winnie Yan
Grapheme-to-phoneme conversion is an important component in many speech technologies, but until recently there were no multilingual benchmarks for this task.
1 code implementation • NAACL (SIGMORPHON) 2022 • Khuyagbaatar Batsuren, Gábor Bella, Aryaman Arora, Viktor Martinović, Kyle Gorman, Zdeněk Žabokrtský, Amarsanaa Ganbold, Šárka Dohnalová, Magda Ševčíková, Kateřina Pelegrinová, Fausto Giunchiglia, Ryan Cotterell, Ekaterina Vylomova
The SIGMORPHON 2022 shared task on morpheme segmentation challenged systems to decompose a word into a sequence of morphemes and covered most types of morphology: compounds, derivations, and inflections.
Ranked #8 on Morpheme Segmentaiton on UniMorph 4.0
no code implementations • LREC 2022 • Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova
The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.
no code implementations • 14 Apr 2022 • Kyle Gorman, Cyril Allauzen
We describe an algorithm which finds the shortest string for a weighted non-deterministic automaton over such semirings using the backwards shortest distance of an equivalent deterministic automaton (DFA) as a heuristic for A* search performed over a companion idempotent semiring, which is proven to return the shortest string.
no code implementations • 9 Oct 2021 • Géza Kiss, Kyle Gorman, Jan P. H. van Santen
We consider the problem of constructing matched groups such that the resulting groups are statistically similar with respect to their average values for multiple covariates.
no code implementations • Findings (EMNLP) 2021 • Kyle Gorman, Christo Kirov, Brian Roark, Richard Sproat
Ad hoc abbreviations are commonly found in informal communication channels that favor shorter messages.
1 code implementation • 11 Apr 2021 • Yang Zhang, Evelina Bakhturina, Kyle Gorman, Boris Ginsburg
Inverse text normalization (ITN) converts spoken-domain automatic speech recognition (ASR) output into written-domain text to improve the readability of the ASR output.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • EMNLP (WNUT) 2020 • Angie Waller, Kyle Gorman
Student reviews often make reference to professors' physical appearances.
no code implementations • EMNLP 2020 • Piotr Szymański, Kyle Gorman
Recent work raises concerns about the use of standard splits to compare natural language processing models.
no code implementations • WS 2020 • Kyle Gorman, Lucas F.E. Ashby, Aaron Goyzueta, Arya McCarthy, Shijie Wu, Daniel You
We describe the design and findings of the SIGMORPHON 2020 shared task on multilingual grapheme-to-phoneme conversion.
no code implementations • LREC 2020 • Jackson L. Lee, Lucas F.E. Ashby, M. Elizabeth Garza, Yeonju Lee-Sikka, Sean Miller, Alan Wong, Arya D. McCarthy, Kyle Gorman
We introduce WikiPron, an open-source command-line tool for extracting pronunciation data from Wiktionary, a collaborative multilingual online dictionary.
no code implementations • LREC 2020 • Arya D. McCarthy, Christo Kirov, Matteo Grella, Amrit Nidhi, Patrick Xia, Kyle Gorman, Ekaterina Vylomova, Sabrina J. Mielke, Garrett Nicolai, Miikka Silfverberg, Timofey Arkhangelskiy, Nataly Krizhanovsky, Andrew Krizhanovsky, Elena Klyachko, Alexey Sorokin, John Mansfield, Valts Ern{\v{s}}treits, Yuval Pinter, Cass Jacobs, ra L., Ryan Cotterell, Mans Hulden, David Yarowsky
The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.
no code implementations • CONLL 2019 • Kyle Gorman, Arya D. McCarthy, Ryan Cotterell, Ekaterina Vylomova, Miikka Silfverberg, Magdalena Markowska
We conduct a manual error analysis of the CoNLL-SIGMORPHON Shared Task on Morphological Reinflection.
1 code implementation • ACL 2019 • Kyle Gorman, Steven Bedrick
It is standard practice in speech {\&} language technology to rank systems according to their performance on a test set held out for evaluation.
no code implementations • ACL 2019 • Sabrina J. Mielke, Ryan Cotterell, Kyle Gorman, Brian Roark, Jason Eisner
Trying to answer the question of what features difficult languages have in common, we try and fail to reproduce our earlier (Cotterell et al., 2018) observation about morphological complexity and instead reveal far simpler statistics of the data that seem to drive complexity in a much larger sample.
no code implementations • CL 2019 • Hao Zhang, Richard Sproat, Axel H. Ng, Felix Stahlberg, Xiaochang Peng, Kyle Gorman, Brian Roark
One problem that has been somewhat resistant to effective machine learning solutions is text normalization for speech applications such as text-to-speech synthesis (TTS).
no code implementations • WS 2017 • Joel Adams, Steven Bedrick, Gerasimos Fergadiotis, Kyle Gorman, Jan van Santen
We present a system for automatically detecting and classifying phonologically anomalous productions in the speech of individuals with aphasia.
no code implementations • 21 Sep 2016 • Ke Wu, Kyle Gorman, Richard Sproat
In speech-applications such as text-to-speech (TTS) or automatic speech recognition (ASR), \emph{text normalization} refers to the task of converting from a \emph{written} representation into a representation of how the text is to be \emph{spoken}.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • TACL 2016 • Kyle Gorman, Richard Sproat
We propose two models for verbalizing numbers, a key component in speech recognition and synthesis systems.