Search Results for author: Dongseong Hwang

Found 14 papers, 0 papers with code

TransformerFAM: Feedback attention is working memory

no code implementations • 14 Apr 2024 • Dongseong Hwang, Weiran Wang, Zhuoyuan Huo, Khe Chai Sim, Pedro Moreno Mengibar

While Transformers have revolutionized deep learning, their quadratic attention complexity hinders their ability to process infinitely long inputs.

Paper
Add Code

Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models

no code implementations • 27 Feb 2024 • Rohit Prabhavalkar, Zhong Meng, Weiran Wang, Adam Stooke, Xingyu Cai, Yanzhang He, Arun Narayanan, Dongseong Hwang, Tara N. Sainath, Pedro J. Moreno

In the present work, we study one such strategy: applying multiple frame reduction layers in the encoder to compress encoder outputs into a small number of output frames.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Revisiting the Entropy Semiring for Neural Speech Recognition

no code implementations • 13 Dec 2023 • Oscar Chang, Dongseong Hwang, Olivier Siohan

In this work, we revisit the entropy semiring for neural speech recognition models, and show how alignment entropy can be used to supervise models through regularization or distillation.

speech-recognition Speech Recognition

Paper
Add Code

Massive End-to-end Models for Short Search Queries

no code implementations • 22 Sep 2023 • Weiran Wang, Rohit Prabhavalkar, Dongseong Hwang, Qiujia Li, Khe Chai Sim, Bo Li, James Qin, Xingyu Cai, Adam Stooke, Zhong Meng, CJ Zheng, Yanzhang He, Tara Sainath, Pedro Moreno Mengibar

In this work, we investigate two popular end-to-end automatic speech recognition (ASR) models, namely Connectionist Temporal Classification (CTC) and RNN-Transducer (RNN-T), for offline recognition of voice search queries, with up to 2B model parameters.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Improving Speech Recognition for African American English With Audio Classification

no code implementations • 16 Sep 2023 • Shefali Garg, Zhouyuan Huo, Khe Chai Sim, Suzan Schwartz, Mason Chua, Alëna Aksënova, Tsendsuren Munkhdalai, Levi King, Darryl Wright, Zion Mengesha, Dongseong Hwang, Tara Sainath, Françoise Beaufays, Pedro Moreno Mengibar

By combining the classifier output with coarse geographic information, we can select a subset of utterances from a large corpus of untranscribed short-form queries for semi-supervised learning at scale.

Audio Classification Automatic Speech Recognition +2

Paper
Add Code

Edit Distance based RL for RNNT decoding

no code implementations • 31 May 2023 • Dongseong Hwang, Changwan Ryu, Khe Chai Sim

RNN-T is currently considered the industry standard in ASR due to its exceptional WERs in various benchmark tests and its ability to support seamless streaming and longform transcription.

Paper
Add Code

Modular Domain Adaptation for Conformer-Based Streaming ASR

no code implementations • 22 May 2023 • Qiujia Li, Bo Li, Dongseong Hwang, Tara N. Sainath, Pedro M. Mengibar

Speech data from different domains has distinct acoustic and linguistic characteristics.

Domain Adaptation speech-recognition +1

Paper
Add Code

Efficient Domain Adaptation for Speech Foundation Models

no code implementations • 3 Feb 2023 • Bo Li, Dongseong Hwang, Zhouyuan Huo, Junwen Bai, Guru Prakash, Tara N. Sainath, Khe Chai Sim, Yu Zhang, Wei Han, Trevor Strohman, Francoise Beaufays

The FM encoder adapter and decoder are then finetuned to the target domain with a small amount of supervised in-domain data.

Domain Adaptation speech-recognition +2

Paper
Add Code

Resource-Efficient Transfer Learning From Speech Foundation Model Using Hierarchical Feature Fusion

no code implementations • 4 Nov 2022 • Zhouyuan Huo, Khe Chai Sim, Bo Li, Dongseong Hwang, Tara N. Sainath, Trevor Strohman

Experimental results show that the proposed method can achieve better performance on speech recognition task than existing algorithms with fewer number of trainable parameters, less computational memory cost and faster training speed.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Comparison of Soft and Hard Target RNN-T Distillation for Large-scale ASR

no code implementations • 11 Oct 2022 • Dongseong Hwang, Khe Chai Sim, Yu Zhang, Trevor Strohman

Knowledge distillation is an effective machine learning technique to transfer knowledge from a teacher model to a smaller student model, especially with unlabeled data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes

no code implementations • 13 Apr 2022 • Shaojin Ding, Weiran Wang, Ding Zhao, Tara N. Sainath, Yanzhang He, Robert David, Rami Botros, Xin Wang, Rina Panigrahy, Qiao Liang, Dongseong Hwang, Ian McGraw, Rohit Prabhavalkar, Trevor Strohman

In this paper, we propose a dynamic cascaded encoder Automatic Speech Recognition (ASR) model, which unifies models for different deployment scenarios.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1