no code implementations • 14 Feb 2024 • Ruchao Fan, Natarajan Balaji Shanka, Abeer Alwan
UniEnc-CASSNAT consists of only an encoder as the major module, which can be the SFM.
1 code implementation • 28 Apr 2023 • Ruchao Fan, Yunzheng Zhu, Jinhan Wang, Abeer Alwan
With the proposed methods (E-APC and DRAFT), the relative WER improvements are even larger (30% and 19% on the OGI and MyST data, respectively) when compared to the models without using pretraining methods.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 15 Apr 2023 • Ruchao Fan, Wei Chu, Peng Chang, Abeer Alwan
During inference, an error-based alignment sampling method is investigated in depth to reduce the alignment mismatch in the training and testing processes.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 16 Oct 2022 • Ruchao Fan, Guoli Ye, Yashesh Gaur, Jinyu Li
As a result, we reduce the WER of a streaming TT from 7. 6% to 6. 5% on the Librispeech test-other data and the CER from 7. 3% to 6. 1% on the Aishell test data, respectively.
no code implementations • 16 Oct 2022 • Ruchao Fan, Yiming Wang, Yashesh Gaur, Jinyu Li
We examine CTCBERT on IDs from HuBERT Iter1, HuBERT Iter2, and PBERT.
no code implementations • 16 Jun 2022 • Ruchao Fan, Abeer Alwan
However, models trained through SSL are biased to the pretraining data which is usually different from the data used in finetuning tasks, causing a domain shifting problem, and thus resulting in limited knowledge transfer.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 24 Feb 2022 • Yunzheng Zhu, Ruchao Fan, Abeer Alwan
When data are scarce, the model might overfit to the training data, and hence good starting points for training are essential.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 19 Feb 2022 • Alexander Johnson, Ruchao Fan, Robin Morris, Abeer Alwan
This paper proposes a novel linear prediction coding-based data aug-mentation method for children's low and zero resource dialect ASR.
no code implementations • 18 Jun 2021 • Jinhan Wang, Yunzheng Zhu, Ruchao Fan, Wei Chu, Abeer Alwan
~ 5 hours of transcribed data and ~ 60 hours of untranscribed data are provided to develop a German ASR system for children.
no code implementations • 18 Jun 2021 • Ruchao Fan, Wei Chu, Peng Chang, Jing Xiao, Abeer Alwan
For the analyses, we plot attention weight distributions in the decoders to visualize the relationships between token-level acoustic embeddings.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 18 Feb 2021 • Gary Yeung, Ruchao Fan, Abeer Alwan
Because of the lack of publicly available young child speech data, feature extraction strategies such as feature normalization and data augmentation must be considered to successfully train child ASR systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 12 Feb 2021 • Ruchao Fan, Amber Afshan, Abeer Alwan
We present a bidirectional unsupervised model pre-training (UPT) method and apply it to children's automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 28 Oct 2020 • Ruchao Fan, Wei Chu, Peng Chang, Jing Xiao
The information are used to extract acoustic representation for each token in parallel, referred to as token-level acoustic embedding which substitutes the word embedding in autoregressive transformer (AT) to achieve parallel generation in decoder.
no code implementations • 8 Aug 2020 • Vijay Ravi, Ruchao Fan, Amber Afshan, Huanhua Lu, Abeer Alwan
A fusion of the x-vector/PLDA baseline and the SID/PLDA scores prior to PID fusion further improved performance by 15% indicating complementarity of the proposed approach to the x-vector system.
no code implementations • 1 Nov 2019 • Pan Zhou, Ruchao Fan, Wei Chen, Jia Jia
Transformer has shown promising results in many sequence to sequence transformation tasks recently.
no code implementations • 13 Nov 2018 • Ruchao Fan, Pan Zhou, Wei Chen, Jia Jia, Gang Liu
In previous work, researchers have shown that such architectures can acquire comparable results to state-of-the-art ASR systems, especially when using a bidirectional encoder and global soft attention (GSA) mechanism.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2