no code implementations • 14 Apr 2024 • Dongseong Hwang, Weiran Wang, Zhuoyuan Huo, Khe Chai Sim, Pedro Moreno Mengibar
While Transformers have revolutionized deep learning, their quadratic attention complexity hinders their ability to process infinitely long inputs.
no code implementations • 27 Feb 2024 • Rohit Prabhavalkar, Zhong Meng, Weiran Wang, Adam Stooke, Xingyu Cai, Yanzhang He, Arun Narayanan, Dongseong Hwang, Tara N. Sainath, Pedro J. Moreno
In the present work, we study one such strategy: applying multiple frame reduction layers in the encoder to compress encoder outputs into a small number of output frames.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 13 Dec 2023 • Oscar Chang, Dongseong Hwang, Olivier Siohan
In this work, we revisit the entropy semiring for neural speech recognition models, and show how alignment entropy can be used to supervise models through regularization or distillation.
no code implementations • 22 Sep 2023 • Weiran Wang, Rohit Prabhavalkar, Dongseong Hwang, Qiujia Li, Khe Chai Sim, Bo Li, James Qin, Xingyu Cai, Adam Stooke, Zhong Meng, CJ Zheng, Yanzhang He, Tara Sainath, Pedro Moreno Mengibar
In this work, we investigate two popular end-to-end automatic speech recognition (ASR) models, namely Connectionist Temporal Classification (CTC) and RNN-Transducer (RNN-T), for offline recognition of voice search queries, with up to 2B model parameters.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 16 Sep 2023 • Shefali Garg, Zhouyuan Huo, Khe Chai Sim, Suzan Schwartz, Mason Chua, Alëna Aksënova, Tsendsuren Munkhdalai, Levi King, Darryl Wright, Zion Mengesha, Dongseong Hwang, Tara Sainath, Françoise Beaufays, Pedro Moreno Mengibar
By combining the classifier output with coarse geographic information, we can select a subset of utterances from a large corpus of untranscribed short-form queries for semi-supervised learning at scale.
no code implementations • 31 May 2023 • Dongseong Hwang, Changwan Ryu, Khe Chai Sim
RNN-T is currently considered the industry standard in ASR due to its exceptional WERs in various benchmark tests and its ability to support seamless streaming and longform transcription.
no code implementations • 22 May 2023 • Qiujia Li, Bo Li, Dongseong Hwang, Tara N. Sainath, Pedro M. Mengibar
Speech data from different domains has distinct acoustic and linguistic characteristics.
no code implementations • 3 Feb 2023 • Bo Li, Dongseong Hwang, Zhouyuan Huo, Junwen Bai, Guru Prakash, Tara N. Sainath, Khe Chai Sim, Yu Zhang, Wei Han, Trevor Strohman, Francoise Beaufays
The FM encoder adapter and decoder are then finetuned to the target domain with a small amount of supervised in-domain data.
no code implementations • 4 Nov 2022 • Zhouyuan Huo, Khe Chai Sim, Bo Li, Dongseong Hwang, Tara N. Sainath, Trevor Strohman
Experimental results show that the proposed method can achieve better performance on speech recognition task than existing algorithms with fewer number of trainable parameters, less computational memory cost and faster training speed.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 11 Oct 2022 • Dongseong Hwang, Khe Chai Sim, Yu Zhang, Trevor Strohman
Knowledge distillation is an effective machine learning technique to transfer knowledge from a teacher model to a smaller student model, especially with unlabeled data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 13 Apr 2022 • Shaojin Ding, Weiran Wang, Ding Zhao, Tara N. Sainath, Yanzhang He, Robert David, Rami Botros, Xin Wang, Rina Panigrahy, Qiao Liang, Dongseong Hwang, Ian McGraw, Rohit Prabhavalkar, Trevor Strohman
In this paper, we propose a dynamic cascaded encoder Automatic Speech Recognition (ASR) model, which unifies models for different deployment scenarios.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 22 Mar 2022 • Dongseong Hwang, Khe Chai Sim, Zhouyuan Huo, Trevor Strohman
State-of-the-art automatic speech recognition (ASR) systems are trained with tens of thousands of hours of labeled speech data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 1 Oct 2021 • Zhouyuan Huo, Dongseong Hwang, Khe Chai Sim, Shefali Garg, Ananya Misra, Nikhil Siddhartha, Trevor Strohman, Françoise Beaufays
These models are typically trained on the server using transcribed speech data.
no code implementations • 1 Oct 2021 • Dongseong Hwang, Ananya Misra, Zhouyuan Huo, Nikhil Siddhartha, Shefali Garg, David Qiu, Khe Chai Sim, Trevor Strohman, Françoise Beaufays, Yanzhang He
Self- and semi-supervised learning methods have been actively investigated to reduce labeled training data or enhance the model performance.