no code implementations • 20 Feb 2024 • Yanan Chen, Zihao Cui, Yingying Gao, Junlan Feng, Chao Deng, Shilei Zhang
In this study, we present a novel weighting prediction approach, which explicitly learns the task relationships from downstream training information to address the core challenge of universal speech enhancement.
no code implementations • 23 Oct 2023 • Yingying Gao, Shilei Zhang, Zihao Cui, Chao Deng, Junlan Feng
Cascading multiple pre-trained models is an effective way to compose an end-to-end system.
no code implementations • 20 Oct 2023 • Yingying Gao, Shilei Zhang, Zihao Cui, Yanhan Xu, Chao Deng, Junlan Feng
Self-supervised pre-trained models such as HuBERT and WavLM leverage unlabeled speech data for representation learning and offer significantly improve for numerous downstream tasks.
no code implementations • 26 Jun 2022 • Yingying Gao, Junlan Feng, Chao Deng, Shilei Zhang
Spoken language understanding (SLU) treats automatic speech recognition (ASR) and natural language understanding (NLU) as a unified task and usually suffers from data scarcity.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 16 Jun 2022 • Yingying Gao, Junlan Feng, Tianrui Wang, Chao Deng, Shilei Zhang
Analysis shows that our proposed approach brings a better uniformity for the trained model and enlarges the CTC spikes obviously.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 1 Apr 2022 • Tianrui Wang, Weibin Zhu, Yingying Gao, Junlan Feng, Shilei Zhang
Joint training of speech enhancement model (SE) and speech recognition model (ASR) is a common solution for robust ASR in noisy environments.
no code implementations • 25 Feb 2022 • Tianrui Wang, Weibin Zhu, Yingying Gao, Yanan Chen, Junlan Feng, Shilei Zhang
Therefore, we previously proposed a harmonic gated compensation network (HGCN) to predict the full harmonic locations based on the unmasked harmonics and process the result of a coarse enhancement module to recover the masked harmonics.
1 code implementation • 30 Jan 2022 • Tianrui Wang, Weibin Zhu, Yingying Gao, Junlan Feng, Shilei Zhang
Mask processing in the time-frequency (T-F) domain through the neural network has been one of the mainstreams for single-channel speech enhancement.