1 code implementation • 17 May 2023 • Jie Zhang, Qing-Tian Xu, Qiu-Shi Zhu, Zhen-Hua Ling
In this paper, we thus propose a novel time-domain brain-assisted SE network (BASEN) incorporating electroencephalography (EEG) signals recorded from the listener for extracting the target speaker from monaural speech mixtures.
no code implementations • 16 Feb 2023 • Xiao-Ying Zhao, Qiu-Shi Zhu, Jie Zhang
With advances in deep learning, neural network based speech enhancement (SE) has developed rapidly in the last decade.
1 code implementation • 27 Oct 2022 • Qiu-Shi Zhu, Long Zhou, Jie Zhang, Shu-Jie Liu, Yu-Chen Hu, Li-Rong Dai
Self-supervised pre-training methods based on contrastive learning or regression tasks can utilize more unlabeled data to improve the performance of automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 28 Sep 2022 • Xiao-Ying Zhao, Qiu-Shi Zhu, Jie Zhang
Specifically, the encoder and bottleneck layer of the DEMUCS model are initialized using the self-supervised pretrained WavLM model, the convolution in the encoder is replaced by causal convolution, and the transformer encoder in the bottleneck layer is based on causal attention mask.
no code implementations • 26 May 2022 • Qiu-Shi Zhu, Jie Zhang, Zi-Qiang Zhang, Li-Rong Dai
Speech enhancement (SE) is usually required as a front end to improve the speech quality in noisy environments, while the enhanced speech might not be optimal for automatic speech recognition (ASR) systems due to speech distortion.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 5 Apr 2022 • Ye-Qian Du, Jie Zhang, Qiu-Shi Zhu, Li-Rong Dai, Ming-Hui Wu, Xin Fang, Zhou-Wang Yang
Unpaired data has shown to be beneficial for low-resource automatic speech recognition~(ASR), which can be involved in the design of hybrid models with multi-task training or language model dependent pre-training.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 22 Jan 2022 • Xing-Yu Chen, Qiu-Shi Zhu, Jie Zhang, Li-Rong Dai
By using the acoustic signals to train the network, respectively, we can build individual models for three tasks, whose parameters are averaged to obtain an average model, which is then used as the initialization for the BiLSTM model training of each task.
no code implementations • 22 Jan 2022 • Qiu-Shi Zhu, Jie Zhang, Zi-Qiang Zhang, Ming-Hui Wu, Xin Fang, Li-Rong Dai
In this work, we therefore first analyze the noise robustness of wav2vec2. 0 via experiments.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2