Search Results for author: Khe Chai Sim

Found 29 papers, 1 papers with code

TransformerFAM: Feedback attention is working memory

no code implementations • 14 Apr 2024 • Dongseong Hwang, Weiran Wang, Zhuoyuan Huo, Khe Chai Sim, Pedro Moreno Mengibar

While Transformers have revolutionized deep learning, their quadratic attention complexity hinders their ability to process infinitely long inputs.

Paper
Add Code

Hierarchical Recurrent Adapters for Efficient Multi-Task Adaptation of Large Speech Models

no code implementations • 25 Mar 2024 • Tsendsuren Munkhdalai, Youzheng Chen, Khe Chai Sim, Fadi Biadsy, Tara Sainath, Pedro Moreno Mengibar

However, their per-task parameter overhead is considered still high when the number of downstream tasks to adapt for is large.

Automatic Speech Recognition speech-recognition +1

Paper
Add Code

Profit: Benchmarking Personalization and Robustness Trade-off in Federated Prompt Tuning

no code implementations • 6 Oct 2023 • Liam Collins, Shanshan Wu, Sewoong Oh, Khe Chai Sim

In many applications of federated learning (FL), clients desire models that are personalized using their local data, yet are also robust in the sense that they retain general global knowledge.

Benchmarking Federated Learning +2

Paper
Add Code

Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm

no code implementations • 29 Sep 2023 • Weiran Wang, Zelin Wu, Diamantino Caseiro, Tsendsuren Munkhdalai, Khe Chai Sim, Pat Rondon, Golan Pundak, Gan Song, Rohit Prabhavalkar, Zhong Meng, Ding Zhao, Tara Sainath, Pedro Moreno Mengibar

Contextual biasing refers to the problem of biasing the automatic speech recognition (ASR) systems towards rare entities that are relevant to the specific user or application scenarios.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Massive End-to-end Models for Short Search Queries

no code implementations • 22 Sep 2023 • Weiran Wang, Rohit Prabhavalkar, Dongseong Hwang, Qiujia Li, Khe Chai Sim, Bo Li, James Qin, Xingyu Cai, Adam Stooke, Zhong Meng, CJ Zheng, Yanzhang He, Tara Sainath, Pedro Moreno Mengibar

In this work, we investigate two popular end-to-end automatic speech recognition (ASR) models, namely Connectionist Temporal Classification (CTC) and RNN-Transducer (RNN-T), for offline recognition of voice search queries, with up to 2B model parameters.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Improving Speech Recognition for African American English With Audio Classification

no code implementations • 16 Sep 2023 • Shefali Garg, Zhouyuan Huo, Khe Chai Sim, Suzan Schwartz, Mason Chua, Alëna Aksënova, Tsendsuren Munkhdalai, Levi King, Darryl Wright, Zion Mengesha, Dongseong Hwang, Tara Sainath, Françoise Beaufays, Pedro Moreno Mengibar

By combining the classifier output with coarse geographic information, we can select a subset of utterances from a large corpus of untranscribed short-form queries for semi-supervised learning at scale.

Audio Classification Automatic Speech Recognition +2

Paper
Add Code

Edit Distance based RL for RNNT decoding

no code implementations • 31 May 2023 • Dongseong Hwang, Changwan Ryu, Khe Chai Sim

RNN-T is currently considered the industry standard in ASR due to its exceptional WERs in various benchmark tests and its ability to support seamless streaming and longform transcription.

Paper
Add Code

Efficient Domain Adaptation for Speech Foundation Models

no code implementations • 3 Feb 2023 • Bo Li, Dongseong Hwang, Zhouyuan Huo, Junwen Bai, Guru Prakash, Tara N. Sainath, Khe Chai Sim, Yu Zhang, Wei Han, Trevor Strohman, Francoise Beaufays

The FM encoder adapter and decoder are then finetuned to the target domain with a small amount of supervised in-domain data.

Decoder Domain Adaptation +3

Paper
Add Code

Resource-Efficient Transfer Learning From Speech Foundation Model Using Hierarchical Feature Fusion

no code implementations • 4 Nov 2022 • Zhouyuan Huo, Khe Chai Sim, Bo Li, Dongseong Hwang, Tara N. Sainath, Trevor Strohman

Experimental results show that the proposed method can achieve better performance on speech recognition task than existing algorithms with fewer number of trainable parameters, less computational memory cost and faster training speed.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Comparison of Soft and Hard Target RNN-T Distillation for Large-scale ASR

no code implementations • 11 Oct 2022 • Dongseong Hwang, Khe Chai Sim, Yu Zhang, Trevor Strohman

Knowledge distillation is an effective machine learning technique to transfer knowledge from a teacher model to a smaller student model, especially with unlabeled data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Large vocabulary speech recognition for languages of Africa: multilingual modeling and self-supervised learning

no code implementations • 5 Aug 2022 • Sandy Ritchie, You-Chi Cheng, Mingqing Chen, Rajiv Mathews, Daan van Esch, Bo Li, Khe Chai Sim

Almost none of the 2, 000+ languages spoken in Africa have widely available automatic speech recognition systems, and the required data is also only available for a few languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

UserLibri: A Dataset for ASR Personalization Using Only Text

no code implementations • 2 Jul 2022 • Theresa Breiner, Swaroop Ramaswamy, Ehsan Variani, Shefali Garg, Rajiv Mathews, Khe Chai Sim, Kilol Gupta, Mingqing Chen, Lara McConnaughey

We experiment on a user-clustered LibriSpeech corpus, supplemented with personalized text-only data for each user from Project Gutenberg.

Language Modelling speech-recognition +1

Paper
Add Code

Pseudo Label Is Better Than Human Label

no code implementations • 22 Mar 2022 • Dongseong Hwang, Khe Chai Sim, Zhouyuan Huo, Trevor Strohman

State-of-the-art automatic speech recognition (ASR) systems are trained with tens of thousands of hours of labeled speech data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Joint Unsupervised and Supervised Training for Multilingual ASR

no code implementations • 15 Nov 2021 • Junwen Bai, Bo Li, Yu Zhang, Ankur Bapna, Nikhil Siddhartha, Khe Chai Sim, Tara N. Sainath

Our average WER of all languages outperforms average monolingual baseline by 33. 3%, and the state-of-the-art 2-stage XLSR by 32%.

Language Modelling Masked Language Modeling +3

Paper
Add Code

Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition

no code implementations • 5 Oct 2021 • Tsendsuren Munkhdalai, Khe Chai Sim, Angad Chandorkar, Fan Gao, Mason Chua, Trevor Strohman, Françoise Beaufays

Fast contextual adaptation has shown to be effective in improving Automatic Speech Recognition (ASR) of rare words and when combined with an on-device personalized training, it can yield an even better recognition result.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Incremental Layer-wise Self-Supervised Learning for Efficient Speech Domain Adaptation On Device

no code implementations • 1 Oct 2021 • Zhouyuan Huo, Dongseong Hwang, Khe Chai Sim, Shefali Garg, Ananya Misra, Nikhil Siddhartha, Trevor Strohman, Françoise Beaufays

These models are typically trained on the server using transcribed speech data.

Domain Adaptation Self-Supervised Learning +2

Paper
Add Code

Large-scale ASR Domain Adaptation using Self- and Semi-supervised Learning

no code implementations • 1 Oct 2021 • Dongseong Hwang, Ananya Misra, Zhouyuan Huo, Nikhil Siddhartha, Shefali Garg, David Qiu, Khe Chai Sim, Trevor Strohman, Françoise Beaufays, Yanzhang He

Self- and semi-supervised learning methods have been actively investigated to reduce labeled training data or enhance the model performance.

Domain Adaptation

Paper
Add Code

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

no code implementations • 27 Sep 2021 • Yu Zhang, Daniel S. Park, Wei Han, James Qin, Anmol Gulati, Joel Shor, Aren Jansen, Yuanzhong Xu, Yanping Huang, Shibo Wang, Zongwei Zhou, Bo Li, Min Ma, William Chan, Jiahui Yu, Yongqiang Wang, Liangliang Cao, Khe Chai Sim, Bhuvana Ramabhadran, Tara N. Sainath, Françoise Beaufays, Zhifeng Chen, Quoc V. Le, Chung-Cheng Chiu, Ruoming Pang, Yonghui Wu

We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio.

Ranked #1 on Speech Recognition on Common Voice

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

On-Device Personalization of Automatic Speech Recognition Models for Disordered Speech

no code implementations • 18 Jun 2021 • Katrin Tomanek, Françoise Beaufays, Julie Cattiau, Angad Chandorkar, Khe Chai Sim

While current state-of-the-art Automatic Speech Recognition (ASR) systems achieve high accuracy on typical speech, they suffer from significant performance degradation on disordered speech and other atypical speech patterns.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Low-rank Gradient Approximation For Memory-Efficient On-device Training of Deep Neural Network

no code implementations • 24 Jan 2020 • Mary Gooneratne, Khe Chai Sim, Petr Zadrazil, Andreas Kabel, Françoise Beaufays, Giovanni Motta

Training machine learning models on mobile devices has the potential of improving both privacy and accuracy of the models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Personalization of End-to-end Speech Recognition On Mobile Devices For Named Entities

no code implementations • 14 Dec 2019 • Khe Chai Sim, Françoise Beaufays, Arnaud Benard, Dhruv Guliani, Andreas Kabel, Nikhil Khare, Tamar Lucassen, Petr Zadrazil, Harry Zhang, Leif Johnson, Giovanni Motta, Lillian Zhou

With speech input, if the user corrects only the names, the name recall rate improves to 64. 4%.

speech-recognition Speech Recognition

Paper
Add Code

An Investigation Into On-device Personalization of End-to-end Automatic Speech Recognition Models

no code implementations • 14 Sep 2019 • Khe Chai Sim, Petr Zadrazil, Françoise Beaufays

Speaker-independent speech recognition systems trained with data from many users are generally robust against speaker variability and work well for a large population of speakers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Streaming End-to-end Speech Recognition For Mobile Devices

2 code implementations • 15 Nov 2018 • Yanzhang He, Tara N. Sainath, Rohit Prabhavalkar, Ian McGraw, Raziel Alvarez, Ding Zhao, David Rybach, Anjuli Kannan, Yonghui Wu, Ruoming Pang, Qiao Liang, Deepti Bhatia, Yuan Shangguan, Bo Li, Golan Pundak, Khe Chai Sim, Tom Bagby, Shuo-Yiin Chang, Kanishka Rao, Alexander Gruenstein

End-to-end (E2E) models, which directly predict output character sequences given input speech, are good candidates for on-device speech recognition.

speech-recognition Speech Recognition

903

Paper
Code

Toward domain-invariant speech recognition via large scale training

no code implementations • 16 Aug 2018 • Arun Narayanan, Ananya Misra, Khe Chai Sim, Golan Pundak, Anshuman Tripathi, Mohamed Elfeky, Parisa Haghani, Trevor Strohman, Michiel Bacchiani

More importantly, such models generalize better to unseen conditions and allow for rapid adaptation -- we show that by using as little as 10 hours of data from a new domain, an adapted domain-invariant model can match performance of a domain-specific model trained from scratch using 70 times as much data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Understanding Recurrent Neural State Using Memory Signatures

no code implementations • 11 Feb 2018 • Skanda Koppula, Khe Chai Sim, Kean Chin

We demonstrate this method's usefulness in revealing information divergence in the bases of recurrent factorized kernels, visualizing the character-level differences between the memory of n-gram and recurrent language models, and extracting knowledge of history encoded in the layers of grapheme-based end-to-end ASR networks.

Paper
Add Code

Multi-Dialect Speech Recognition With A Single Sequence-To-Sequence Model

no code implementations • 5 Dec 2017 • Bo Li, Tara N. Sainath, Khe Chai Sim, Michiel Bacchiani, Eugene Weinstein, Patrick Nguyen, Zhifeng Chen, Yonghui Wu, Kanishka Rao

Sequence-to-sequence models provide a simple and elegant solution for building speech recognition systems by folding separate components of a typical system, namely acoustic (AM), pronunciation (PM) and language (LM) models into a single neural network.

speech-recognition Speech Recognition