no code implementations • 14 Dec 2023 • Zengrui Jin, Xurong Xie, Tianzi Wang, Mengzhe Geng, Jiajun Deng, Guinan Li, Shujie Hu, Xunying Liu
Automatic recognition of disordered speech remains a highly challenging task to date due to data scarcity.
no code implementations • 26 Jun 2023 • Jiajun Deng, Guinan Li, Xurong Xie, Zengrui Jin, Mingyu Cui, Tianzi Wang, Shujie Hu, Mengzhe Geng, Xunying Liu
Rich sources of variability in natural speech present significant challenges to current data intensive speech recognition technologies.
no code implementations • 18 May 2023 • Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Jiajun Deng, Mingyu Cui, Guinan Li, Jianwei Yu, Xurong Xie, Xunying Liu
A key challenge in dysarthric speech recognition is the speaker-level diversity attributed to both speaker-identity associated factors such as gender, and speech impairment severity.
no code implementations • 28 Feb 2023 • Shujie Hu, Xurong Xie, Zengrui Jin, Mengzhe Geng, Yi Wang, Mingyu Cui, Jiajun Deng, Xunying Liu, Helen Meng
Experiments conducted on the UASpeech dysarthric and DementiaBank Pitt elderly speech corpora suggest TDNN and Conformer ASR systems integrated domain adapted wav2vec2. 0 models consistently outperform the standalone wav2vec2. 0 models by statistically significant WER reductions of 8. 22% and 3. 43% absolute (26. 71% and 15. 88% relative) on the two tasks respectively.
1 code implementation • 15 Feb 2023 • Jiajun Deng, Xurong Xie, Tianzi Wang, Mingyu Cui, Boyang Xue, Zengrui Jin, Guinan Li, Shujie Hu, Xunying Liu
Practical application of unsupervised model-based speaker adaptation techniques to data intensive end-to-end ASR systems is hindered by the scarcity of speaker-level data and performance sensitivity to transcription errors.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 17 Nov 2022 • Xurong Xie, Xunying Liu, Hui Chen, Hongan Wang
Modeling the speaker variability is a key challenge for automatic speech recognition (ASR) systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 3 Nov 2022 • Zengrui Jin, Xurong Xie, Mengzhe Geng, Tianzi Wang, Shujie Hu, Jiajun Deng, Guinan Li, Xunying Liu
After LHUC speaker adaptation, the best system using VAE-GAN based augmentation produced an overall WER of 27. 78% on the UASpeech test set of 16 dysarthric speakers, and the lowest published WER of 57. 31% on the subset of speakers with "Very Low" intelligibility.
no code implementations • 24 Jun 2022 • Jiajun Deng, Xurong Xie, Tianzi Wang, Mingyu Cui, Boyang Xue, Zengrui Jin, Mengzhe Geng, Guinan Li, Xunying Liu, Helen Meng
A key challenge for automatic speech recognition (ASR) systems is to model the speaker level variability.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 23 Jun 2022 • Mingyu Cui, Jiajun Deng, Shoukang Hu, Xurong Xie, Tianzi Wang, Shujie Hu, Mengzhe Geng, Boyang Xue, Xunying Liu, Helen Meng
Fundamental modelling differences between hybrid and end-to-end (E2E) automatic speech recognition (ASR) systems create large diversity and complementarity among them.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 15 Jun 2022 • Shujie Hu, Xurong Xie, Mengzhe Geng, Mingyu Cui, Jiajun Deng, Guinan Li, Tianzi Wang, Xunying Liu, Helen Meng
Articulatory features are inherently invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition (ASR) systems designed for normal speech.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 28 Mar 2022 • Mengzhe Geng, Xurong Xie, Rongfeng Su, Jianwei Yu, Zengrui Jin, Tianzi Wang, Shujie Hu, Zi Ye, Helen Meng, Xunying Liu
Accurate recognition of dysarthric and elderly speech remain challenging tasks to date.
no code implementations • 19 Mar 2022 • Shujie Hu, Shansong Liu, Xurong Xie, Mengzhe Geng, Tianzi Wang, Shoukang Hu, Mingyu Cui, Xunying Liu, Helen Meng
Articulatory features are inherently invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition (ASR) systems for normal speech.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 21 Feb 2022 • Mengzhe Geng, Xurong Xie, Zi Ye, Tianzi Wang, Guinan Li, Shujie Hu, Xunying Liu, Helen Meng
Motivated by the spectro-temporal level differences between dysarthric, elderly and normal speech that systematically manifest in articulatory imprecision, decreased volume and clarity, slower speaking rates and increased dysfluencies, novel spectrotemporal subspace basis deep embedding features derived using SVD speech spectrum decomposition are proposed in this paper to facilitate auxiliary feature based speaker adaptation of state-of-the-art hybrid DNN/TDNN and end-to-end Conformer speech recognition systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 24 Jan 2022 • Xurong Xie, Rukiye Ruzi, Xunying Liu, Lan Wang
Dysarthric speech recognition is a challenging task due to acoustic variability and limited amount of available data.
no code implementations • 24 Jan 2022 • Xurong Xie, Xiang Sui, Xunying Liu, Lan Wang
Meanwhile, approaches of multi-accent modelling including multi-style training, multi-accent decision tree state tying, DNN tandem and multi-level adaptive network (MLAN) tandem hidden Markov model (HMM) modelling are combined and compared in this paper.
no code implementations • 15 Jan 2022 • Shansong Liu, Mengzhe Geng, Shoukang Hu, Xurong Xie, Mingyu Cui, Jianwei Yu, Xunying Liu, Helen Meng
Despite the rapid progress of automatic speech recognition (ASR) technologies in the past few decades, recognition of disordered speech remains a highly challenging task to date.
Audio-Visual Speech Recognition Automatic Speech Recognition +4
no code implementations • 14 Jan 2022 • Mengzhe Geng, Shansong Liu, Jianwei Yu, Xurong Xie, Shoukang Hu, Zi Ye, Zengrui Jin, Xunying Liu, Helen Meng
Automatic recognition of disordered speech remains a highly challenging task to date.
no code implementations • 14 Jan 2022 • Mengzhe Geng, Xurong Xie, Shansong Liu, Jianwei Yu, Shoukang Hu, Xunying Liu, Helen Meng
This paper investigates a set of data augmentation techniques for disordered speech recognition, including vocal tract length perturbation (VTLP), tempo perturbation and speed perturbation.
1 code implementation • 8 Jan 2022 • Shoukang Hu, Xurong Xie, Mingyu Cui, Jiajun Deng, Shansong Liu, Jianwei Yu, Mengzhe Geng, Xunying Liu, Helen Meng
State-of-the-art automatic speech recognition (ASR) system development is data and computation intensive.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 18 Aug 2021 • Jin Li, Rongfeng Su, Xurong Xie, Nan Yan, Lan Wang
The shallow stream is used to acquire traditional shallow features that is beneficial for the classification of phones or words while the deep stream is used to obtain utterance-level speaker-invariant deep features for improving the feature diversity.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 2 Aug 2021 • Zengrui Jin, Mengzhe Geng, Xurong Xie, Jianwei Yu, Shansong Liu, Xunying Liu, Helen Meng
Automatic recognition of disordered speech remains a highly challenging task to date.
1 code implementation • 14 Dec 2020 • Xurong Xie, Xunying Liu, Tan Lee, Lan Wang
A key task for speech recognition systems is to reduce the mismatch between training and evaluation data that is often attributable to speaker differences.
no code implementations • 8 Dec 2020 • Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Zi Ye, Mengzhe Geng, Xunying Liu, Helen Meng
On a third cross domain adaptation task requiring rapidly porting a 1000 hour LibriSpeech data trained system to a small DementiaBank elderly speech corpus, the proposed Bayesian TDNN LF-MMI systems outperformed the baseline system using direct weight fine-tuning by up to 2. 5\% absolute WER reduction.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 17 Jul 2020 • Shoukang Hu, Xurong Xie, Shansong Liu, Mingyu Cui, Mengzhe Geng, Xunying Liu, Helen Meng
Deep neural networks (DNNs) based automatic speech recognition (ASR) systems are often designed using expert knowledge and empirical evaluation.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2