no code implementations • 11 Apr 2022 • Vishal Sunder, Eric Fosler-Lussier, Samuel Thomas, Hong-Kwang J. Kuo, Brian Kingsbury
Recent advances in End-to-End (E2E) Spoken Language Understanding (SLU) have been primarily due to effective pretraining of speech representations.
no code implementations • 11 Apr 2022 • Vishal Sunder, Samuel Thomas, Hong-Kwang J. Kuo, Jatin Ganhotra, Brian Kingsbury, Eric Fosler-Lussier
In the absence of gold transcripts to fine-tune an ASR model, our model outperforms this baseline by a significant margin of 10% absolute F1 score.
no code implementations • 26 Feb 2022 • Samuel Thomas, Hong-Kwang J. Kuo, Brian Kingsbury, George Saon
In this paper, we propose a novel text representation and training methodology that allows E2E SLU systems to be effectively constructed using these text resources.
no code implementations • 26 Feb 2022 • Samuel Thomas, Brian Kingsbury, George Saon, Hong-Kwang J. Kuo
We observe 20-45% relative word error rate (WER) reduction in these settings with this novel LM style customization technique using only unpaired text data from the new domains.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 28 Jan 2022 • Hong-Kwang J. Kuo, Zoltan Tuske, Samuel Thomas, Brian Kingsbury, George Saon
The goal of spoken language understanding (SLU) systems is to determine the meaning of the input speech signal, unlike speech recognition which aims to produce verbatim transcripts.
no code implementations • 18 Aug 2021 • Jatin Ganhotra, Samuel Thomas, Hong-Kwang J. Kuo, Sachindra Joshi, George Saon, Zoltán Tüske, Brian Kingsbury
End-to-end spoken language understanding (SLU) systems that process human-human or human-computer interactions are often context independent and process each turn of a conversation independently.
1 code implementation • 8 Apr 2021 • Samuel Thomas, Hong-Kwang J. Kuo, George Saon, Zoltán Tüske, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory
We present a comprehensive study on building and adapting RNN transducer (RNN-T) models for spoken language understanding(SLU).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 16 Nov 2020 • Edmilson Morais, Hong-Kwang J. Kuo, Samuel Thomas, Zoltan Tuske, Brian Kingsbury
Transformer networks and self-supervised pre-training have consistently delivered state-of-art results in the field of natural language processing (NLP); however, their merits in the field of spoken language understanding (SLU) still need further investigation.
no code implementations • 30 Sep 2020 • Hong-Kwang J. Kuo, Zoltán Tüske, Samuel Thomas, Yinghui Huang, Kartik Audhkhasi, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, Luis Lastras
For our speech-to-entities experiments on the ATIS corpus, both the CTC and attention models showed impressive ability to skip non-entity words: there was little degradation when trained on just entities versus full transcripts.
no code implementations • 27 Apr 2016 • George Saon, Tom Sercu, Steven Rennie, Hong-Kwang J. Kuo
We describe a collection of acoustic and language modeling techniques that lowered the word error rate of our English conversational telephone LVCSR system to a record 6. 6% on the Switchboard subset of the Hub5 2000 evaluation testset.
Ranked #5 on Speech Recognition on swb_hub_500 WER fullSWBCH
no code implementations • 21 May 2015 • George Saon, Hong-Kwang J. Kuo, Steven Rennie, Michael Picheny
We describe the latest improvements to the IBM English conversational telephone speech recognition system.
Ranked #11 on Speech Recognition on Switchboard + Hub500