Search Results for author: Kyle Gorman

Found 22 papers, 3 papers with code

Results of the Second SIGMORPHON Shared Task on Multilingual Grapheme-to-Phoneme Conversion

no code implementations • ACL (SIGMORPHON) 2021 • Lucas F.E. Ashby, Travis M. Bartley, Simon Clematide, Luca Del Signore, Cameron Gibson, Kyle Gorman, Yeonju Lee-Sikka, Peter Makarov, Aidan Malanoski, Sean Miller, Omar Ortiz, Reuben Raff, Arundhati Sengupta, Bora Seo, Yulia Spektor, Winnie Yan

Grapheme-to-phoneme conversion is an important component in many speech technologies, but until recently there were no multilingual benchmarks for this task.

Paper
Add Code

The SIGMORPHON 2022 Shared Task on Morpheme Segmentation

1 code implementation • NAACL (SIGMORPHON) 2022 • Khuyagbaatar Batsuren, Gábor Bella, Aryaman Arora, Viktor Martinović, Kyle Gorman, Zdeněk Žabokrtský, Amarsanaa Ganbold, Šárka Dohnalová, Magda Ševčíková, Kateřina Pelegrinová, Fausto Giunchiglia, Ryan Cotterell, Ekaterina Vylomova

The SIGMORPHON 2022 shared task on morpheme segmentation challenged systems to decompose a word into a sequence of morphemes and covered most types of morphology: compounds, derivations, and inflections.

Ranked #8 on Morpheme Segmentaiton on UniMorph 4.0

Morpheme Segmentaiton Segmentation +1

Paper
Code

UniMorph 4.0: Universal Morphology

no code implementations • LREC 2022 • Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova

The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.

Morphological Inflection

Paper
Add Code

A* shortest string decoding for non-idempotent semirings

no code implementations • 14 Apr 2022 • Kyle Gorman, Cyril Allauzen

We describe an algorithm which finds the shortest string for a weighted non-deterministic automaton over such semirings using the backwards shortest distance of an equivalent deterministic automaton (DFA) as a heuristic for A* search performed over a companion idempotent semiring, which is proven to return the shortest string.

Paper
Add Code

Group-matching algorithms for subjects and items

no code implementations • 9 Oct 2021 • Géza Kiss, Kyle Gorman, Jan P. H. van Santen

We consider the problem of constructing matched groups such that the resulting groups are statistically similar with respect to their average values for multiple covariates.

Paper
Add Code

Structured abbreviation expansion in context

no code implementations • Findings (EMNLP) 2021 • Kyle Gorman, Christo Kirov, Brian Roark, Richard Sproat

Ad hoc abbreviations are commonly found in informal communication channels that favor shorter messages.

Spelling Correction

Paper
Add Code

NeMo Inverse Text Normalization: From Development To Production

1 code implementation • 11 Apr 2021 • Yang Zhang, Evelina Bakhturina, Kyle Gorman, Boris Ginsburg

Inverse text normalization (ITN) converts spoken-domain automatic speech recognition (ASR) output into written-domain text to improve the readability of the ASR output.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

5,577

Paper
Code

Detecting Objectifying Language in Online Professor Reviews

no code implementations • EMNLP (WNUT) 2020 • Angie Waller, Kyle Gorman

Student reviews often make reference to professors' physical appearances.

Paper
Add Code

Is the Best Better? Bayesian Statistical Model Comparison for Natural Language Processing

no code implementations • EMNLP 2020 • Piotr Szymański, Kyle Gorman

Recent work raises concerns about the use of standard splits to compare natural language processing models.

Paper
Add Code

The SIGMORPHON 2020 Shared Task on Multilingual Grapheme-to-Phoneme Conversion

no code implementations • WS 2020 • Kyle Gorman, Lucas F.E. Ashby, Aaron Goyzueta, Arya McCarthy, Shijie Wu, Daniel You

We describe the design and findings of the SIGMORPHON 2020 shared task on multilingual grapheme-to-phoneme conversion.

Paper
Add Code

Massively Multilingual Pronunciation Modeling with WikiPron

no code implementations • LREC 2020 • Jackson L. Lee, Lucas F.E. Ashby, M. Elizabeth Garza, Yeonju Lee-Sikka, Sean Miller, Alan Wong, Arya D. McCarthy, Kyle Gorman

We introduce WikiPron, an open-source command-line tool for extracting pronunciation data from Wiktionary, a collaborative multilingual online dictionary.

Paper
Add Code

UniMorph 3.0: Universal Morphology

no code implementations • LREC 2020 • Arya D. McCarthy, Christo Kirov, Matteo Grella, Amrit Nidhi, Patrick Xia, Kyle Gorman, Ekaterina Vylomova, Sabrina J. Mielke, Garrett Nicolai, Miikka Silfverberg, Timofey Arkhangelskiy, Nataly Krizhanovsky, Andrew Krizhanovsky, Elena Klyachko, Alexey Sorokin, John Mansfield, Valts Ern{\v{s}}treits, Yuval Pinter, Cass Jacobs, ra L., Ryan Cotterell, Mans Hulden, David Yarowsky

Paper
Add Code

Weird Inflects but OK: Making Sense of Morphological Generation Errors

no code implementations • CONLL 2019 • Kyle Gorman, Arya D. McCarthy, Ryan Cotterell, Ekaterina Vylomova, Miikka Silfverberg, Magdalena Markowska

We conduct a manual error analysis of the CoNLL-SIGMORPHON Shared Task on Morphological Reinflection.

Text Generation

Paper
Add Code

We Need to Talk about Standard Splits

1 code implementation • ACL 2019 • Kyle Gorman, Steven Bedrick

It is standard practice in speech {\&} language technology to rank systems according to their performance on a test set held out for evaluation.

Paper
Code

What Kind of Language Is Hard to Language-Model?

no code implementations • ACL 2019 • Sabrina J. Mielke, Ryan Cotterell, Kyle Gorman, Brian Roark, Jason Eisner

Trying to answer the question of what features difficult languages have in common, we try and fail to reproduce our earlier (Cotterell et al., 2018) observation about morphological complexity and instead reveal far simpler statistics of the data that seem to drive complexity in a much larger sample.

Language Modelling Sentence

Paper
Add Code

Neural Models of Text Normalization for Speech Applications

no code implementations • CL 2019 • Hao Zhang, Richard Sproat, Axel H. Ng, Felix Stahlberg, Xiaochang Peng, Kyle Gorman, Brian Roark

One problem that has been somewhat resistant to effective machine learning solutions is text normalization for speech applications such as text-to-speech synthesis (TTS).

BIG-bench Machine Learning Speech Synthesis +1

Paper
Add Code

Improving homograph disambiguation with supervised machine learning

no code implementations • LREC 2018 • Kyle Gorman, Gleb Mazovetskiy, Vitaly Nikolaev

BIG-bench Machine Learning Speech Synthesis +1

Paper
Add Code

Target word prediction and paraphasia classification in spoken discourse

no code implementations • WS 2017 • Joel Adams, Steven Bedrick, Gerasimos Fergadiotis, Kyle Gorman, Jan van Santen

We present a system for automatically detecting and classifying phonologically anomalous productions in the speech of individuals with aphasia.

Classification General Classification +2

Paper
Add Code

Minimally Supervised Written-to-Spoken Text Normalization

no code implementations • 21 Sep 2016 • Ke Wu, Kyle Gorman, Richard Sproat

In speech-applications such as text-to-speech (TTS) or automatic speech recognition (ASR), \emph{text normalization} refers to the task of converting from a \emph{written} representation into a representation of how the text is to be \emph{spoken}.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1