no code implementations • 31 Oct 2023 • Yiwen Shao, Shi-Xiong Zhang, Dong Yu
Multi-channel multi-talker automatic speech recognition (ASR) presents ongoing challenges within the speech community, particularly when confronted with significant reverberation effects.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 25 Oct 2023 • Zili Huang, Yiwen Shao, Shi-Xiong Zhang, Dong Yu
2) Multi-Task Capability: Beyond the single-task focus of previous systems, UniX-Encoder acts as a robust upstream model, adeptly extracting features for diverse tasks including ASR and speaker recognition.
no code implementations • 5 Oct 2023 • Yiwen Shao
Multi-channel multi-talker speech recognition presents formidable challenges in the realm of speech processing, marked by issues such as background noise, reverberation, and overlapping speech.
no code implementations • 8 Apr 2022 • Sonal Joshi, Saurabh Kataria, Yiwen Shao, Piotr Zelasko, Jesus Villalba, Sanjeev Khudanpur, Najim Dehak
We propose three defenses--denoiser pre-processor, adversarially fine-tuning ASR model, and adversarially fine-tuning joint model of ASR and denoiser.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 22 Nov 2021 • Yiwen Shao, Shi-Xiong Zhang, Dong Yu
Experimental results show that 1) the proposed ALL-In-One model achieved a comparable error rate to the pipelined system while reducing the inference time by half; 2) the proposed 3D spatial feature significantly outperformed (31\% CERR) all previous works of using the 1D directional information in both paradigms.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 31 Mar 2021 • Piotr Żelasko, Sonal Joshi, Yiwen Shao, Jesus Villalba, Jan Trmal, Najim Dehak, Sanjeev Khudanpur
We investigate two threat models: a denial-of-service scenario where fast gradient-sign method (FGSM) or weak projected gradient descent (PGD) attacks are used to degrade the model's word error rate (WER); and a targeted scenario where a more potent imperceptible attack forces the system to recognize a specific phrase.
1 code implementation • 20 May 2020 • Yiwen Shao, Yiming Wang, Daniel Povey, Sanjeev Khudanpur
We present PyChain, a fully parallelized PyTorch implementation of end-to-end lattice-free maximum mutual information (LF-MMI) training for the so-called \emph{chain models} in the Kaldi automatic speech recognition (ASR) toolkit.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 14 Feb 2020 • Zili Huang, Shinji Watanabe, Yusuke Fujita, Paola Garcia, Yiwen Shao, Daniel Povey, Sanjeev Khudanpur
Speaker diarization is an important pre-processing step for many speech applications, and it aims to solve the "who spoke when" problem.
1 code implementation • 18 Sep 2019 • Yiming Wang, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe, Sanjeev Khudanpur
We present Espresso, an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit fairseq.
Ranked #1 on Speech Recognition on Hub5'00 CallHome
Automatic Speech Recognition Automatic Speech Recognition (ASR) +6