no code implementations • 8 Jan 2024 • Wei Liu, Jingyong Hou, Dong Yang, Muyong Cao, Tan Lee
Many factors have separately shown their effectiveness on improving multilingual ASR.
no code implementations • 8 Jan 2024 • Yusheng Tian, Jingyu Li, Tan Lee
Experimental results on a real case of tongue cancer patient confirm that the synthetic voice achieves comparable articulation quality to unimpaired natural speech, while effectively maintaining the target speaker's individuality.
no code implementations • 22 Oct 2023 • Dehua Tao, Tan Lee, Harold Chui, Sarah Luk
Counseling is usually conducted through spoken conversation between a therapist and a client.
no code implementations • 22 Oct 2023 • Dehua Tao, Tan Lee, Harold Chui, Sarah Luk
Counseling is carried out as spoken conversation between a therapist and a client.
no code implementations • 24 Sep 2023 • Jingyu Li, Tan Lee
The development of deep neural networks (DNN) has significantly enhanced the performance of speaker verification (SV) systems in recent years.
no code implementations • 21 Sep 2023 • Wei Liu, Ying Qin, Zhiyuan Peng, Tan Lee
Child speech, as a representative type of low-resource speech, is leveraged for adaptation.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 21 Sep 2023 • Wei Liu, Zhiyuan Peng, Tan Lee
The search process is carried out in two steps: (1) coarse search: to determine top $K$ candidates by pruning the most redundant layers based on the correlation matrix; (2) fine search: to select the best pruning proposal among $K$ candidates using a task-specific evaluation metric.
no code implementations • 3 Jul 2023 • Yujia Xiao, Shaofei Zhang, Xi Wang, Xu Tan, Lei He, Sheng Zhao, Frank K. Soong, Tan Lee
Experiments show that ContextSpeech significantly improves the voice quality and prosody expressiveness in paragraph reading with competitive model efficiency.
no code implementations • 27 May 2023 • Yusheng Tian, Guangyan Zhang, Tan Lee
Specifically, a diffusion-based speech synthesis model is trained on original recordings, to capture and preserve the target speaker's original articulation style.
no code implementations • 26 May 2023 • Dehua Tao, Tan Lee, Harold Chui, Sarah Luk
Conversations with extreme values of empathy rating are used to train a Siamese network based encoder with contrastive loss.
1 code implementation • 18 May 2023 • Yusheng Tian, Wei Liu, Tan Lee
One way to address this problem is to pre-enhance the speech with an enhancement model and then use the enhanced data for text-to-speech (TTS) model training.
no code implementations • 21 Feb 2023 • Wei Liu, Kaiqi Fu, Xiaohai Tian, Shuju Shi, Wei Li, Zejun Ma, Tan Lee
Recent studies on pronunciation scoring have explored the effect of introducing phone embeddings as reference pronunciation, but mostly in an implicit manner, i. e., addition or concatenation of reference phone embedding and actual pronunciation of the target phone as the phone-level pronunciation quality representation.
no code implementations • 20 Feb 2023 • Wei Liu, Kaiqi Fu, Xiaohai Tian, Shuju Shi, Wei Li, Zejun Ma, Tan Lee
A typical fluency scoring system generally relies on an automatic speech recognition (ASR) system to obtain time stamps in input speech for either the subsequent calculation of fluency-related features or directly modeling speech fluency with an end-to-end approach.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 31 Oct 2022 • Jingyu Li, Wei Liu, Zhaoyang Zhang, Jiong Wang, Tan Lee
Experimental results on VoxCeleb show that weight quantization is effective for compressing SV models.
no code implementations • 31 Oct 2022 • Jingyu Li, Yusheng Tian, Tan Lee
The weights are imposed on the input features to improve the representation ability for speaker modeling.
no code implementations • 29 Jun 2022 • Guangyan Zhang, Ying Qin, Wenjie Zhang, Jialun Wu, Mei Li, Yutao Gai, Feijun Jiang, Tan Lee
The emotion encoder extracts the identity of emotion type as well as the respective emotion intensity from the mel-spectrogram of input speech.
1 code implementation • 27 Jun 2022 • Xu Yang, Daoyuan Wu, Xiao Yi, Jimmy H. M. Lee, Tan Lee
In this paper, we propose iExam, an intelligent online exam monitoring and analysis system that can not only use face detection to assist invigilators in real-time student identification, but also be able to detect common abnormal behaviors (including face disappearing, rotating faces, and replacing with a different person during the exams) via a face recognition-based post-exam video analysis.
no code implementations • 26 Jun 2022 • Yusheng Tian, Jingyu Li, Tan Lee
Pooling is needed to aggregate frame-level features into utterance-level representations for speaker modeling.
no code implementations • 15 Jun 2022 • Jingyu Li, Wei Liu, Tan Lee
This paper proposes a domain transfer network, named EDITnet, to alleviate the language-mismatch problem on speaker embeddings without requiring speaker labels.
no code implementations • 15 Jun 2022 • Jingyu Li, Yusheng Tian, Tan Lee
There is no reason to expect that these features are optimal for all different tasks, including speaker verification (SV).
no code implementations • 2 Jun 2022 • Monira Islam, Tan Lee
In this study, the Multivariate Empirical Mode Decomposition (MEMD) approach is applied to extract features from multi-channel EEG signals for mental state classification.
no code implementations • 2 Jun 2022 • Monira Islam, Tan Lee
In this study, the Multivariate Empirical Mode Decomposition (MEMD) is applied to multichannel EEG to obtain scale-aligned intrinsic mode functions (IMFs) as input features for emotion detection.
no code implementations • 25 May 2022 • Wei Liu, Jingyu Li, Tan Lee
The performance of child speech recognition is generally less satisfactory compared to adult speech due to limited amount of training data.
no code implementations • 12 Apr 2022 • Daxin Tan, Liqun Deng, Nianzu Zheng, Yu Ting Yeung, Xin Jiang, Xiao Chen, Tan Lee
This study propose a fully automated system for speech correction and accent reduction.
no code implementations • 31 Mar 2022 • Dehua Tao, Tan Lee, Harold Chui, Sarah Luk
Counseling typically takes the form of spoken conversation between a therapist and a client.
no code implementations • 31 Mar 2022 • Guangyan Zhang, Kaitao Song, Xu Tan, Daxin Tan, Yuzi Yan, Yanqing Liu, Gang Wang, Wei Zhou, Tao Qin, Tan Lee, Sheng Zhao
However, the works apply pre-training with character-based units to enhance the TTS phoneme encoder, which is inconsistent with the TTS fine-tuning that takes phonemes as input.
no code implementations • 29 Mar 2022 • Si-Ioi Ng, Cymie Wing-Yee Ng, Jiarui Wang, Tan Lee
This paper presents a macroscopic approach to automatic detection of speech sound disorder (SSD) in child speech.
no code implementations • 24 Mar 2022 • Dehua Tao, Tan Lee, Harold Chui, Sarah Luk
Using the salient utterance genres, an accuracy of $71\%$ is achieved in classifying psychotherapy sessions into ``high" and ``low" empathy level.
no code implementations • 20 Nov 2021 • Si-Ioi Ng, Rui-Si Ma, Tan Lee, Raymond Kim-Wai Sum
The general wellness of a person is further related to his/her physical literacy (PL), which refers to a holistic description of engagement in PA.
no code implementations • 9 Oct 2021 • Si-Ioi Ng, Tan Lee
The underlying objective is to explore the feasibility of deploying LTR speech in the training of end-to-end (E2E) ASR models, as an attempt to data augmentation for improving the recognition performance.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 8 Oct 2021 • Daxin Tan, Guangyan Zhang, Tan Lee
The key idea is to model the acoustic environment in speech audio as a factor of data variability and incorporate it as a condition in the process of neural network based speech synthesis.
no code implementations • 8 Oct 2021 • Guangyan Zhang, Yichong Leng, Daxin Tan, Ying Qin, Kaitao Song, Xu Tan, Sheng Zhao, Tan Lee
However, in terms of ultimately achieved system performance for target speaker(s), the actual benefits of model pre-training are uncertain and unstable, depending very much on the quantity and text content of training data.
no code implementations • 4 Oct 2021 • Ying Qin, Wei Liu, Zhiyuan Peng, Si-Ioi Ng, Jingyu Li, Haibo Hu, Tan Lee
Input to these classifiers are speech transcripts produced by automatic speech recognition (ASR) models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 20 Sep 2021 • Jingyu Li, Si-Ioi Ng, Tan Lee
Given the embeddings from a pair of input utterances, a graph model is designed to incorporate additional information from a group of embeddings representing the so-called auxiliary speakers.
no code implementations • 16 Sep 2021 • Wei Liu, Tan Lee
The investigation is focused on evaluating and comparing the efficacies of predictor features that are derived from different internal and external modules of the E2E system.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 11 Aug 2021 • Yuzhong Wu, Tan Lee
For a more robust ASC system, We propose a robust feature learning (RFL) framework to train the CNN.
no code implementations • 5 Aug 2021 • Guangyan Zhang, Ying Qin, Daxin Tan, Tan Lee
This paper describes a novel design of a neural network-based speech generation model for learning prosodic representation. The problem of representation learning is formulated according to the information bottleneck (IB) principle.
1 code implementation • 4 Jul 2021 • Daxin Tan, Liqun Deng, Yu Ting Yeung, Xin Jiang, Xiao Chen, Tan Lee
This paper presents the design, implementation and evaluation of a speech editing system, named EditSpeech, which allows a user to perform deletion, insertion and replacement of words in a given speech utterance, without causing audible degradation in speech quality and naturalness.
no code implementations • 16 Jun 2021 • Si-Ioi Ng, Cymie Wing-Yee Ng, Jingyu Li, Tan Lee
This paper investigates a neural network based approach to detecting consonant errors in disordered speech using consonant-vowel (CV) diphone segment in comparison to using consonant monophone segment.
no code implementations • 30 Mar 2021 • Shuiyang Mao, P. C. Ching, Tan Lee
Despite the widespread utilization of deep neural networks (DNNs) for speech emotion recognition (SER), they are severely restricted due to the paucity of labeled data for training.
no code implementations • 8 Mar 2021 • Daxin Tan, Hingpang Huang, Guangyan Zhang, Tan Lee
100 and 5 utterances of 3 target speakers in different voice and style are provided in track 1 and 2 respectively, and the participants are required to synthesize speech in target speaker's voice and style.
1 code implementation • 14 Dec 2020 • Xurong Xie, Xunying Liu, Tan Lee, Lan Wang
A key task for speech recognition systems is to reduce the mismatch between training and evaluation data that is often attributable to speaker differences.
no code implementations • 28 Nov 2020 • Man-Ling Sung, Tan Lee
The Siamese/Triplet network is trained on the hypothesized examples to measure the similarity between two speech segments and hereby perform re-clustering of all hypothesized subword sequences to achieve spoken term discovery.
no code implementations • 8 Nov 2020 • Daxin Tan, Tan Lee
By incorporating a style predictor, the proposed system can also be used for text-to-speech synthesis.
no code implementations • 3 Nov 2020 • Man-Ling Sung, Siyuan Feng, Tan Lee
With the unsupervisedly trained acoustic models, a given audio archive is represented by a pseudo transcription, from which spoken keywords can be discovered by string mining algorithms.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 30 Oct 2019 • Zhiyuan Peng, Siyuan Feng, Tan Lee
The USM experiments on ZeroSpeech 2017 dataset verify that the frame tokenizer is able to capture linguistic content and the utterance embedder can acquire speaker-related information.
no code implementations • 9 Aug 2019 • Siyuan Feng, Tan Lee
Out-of-domain ASR systems can be applied to perform speaker adaptation with untranscribed training data of the target language, and to decode the training speech into frame-level labels for DNN training.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 17 Jun 2019 • Siyuan Feng, Tan Lee
This study tackles unsupervised subword modeling in the zero-resource scenario, learning frame-level speech representation that is phonetically discriminative and speaker-invariant, using only untranscribed speech for target languages.
no code implementations • 17 Jun 2019 • Siyuan Feng, Tan Lee, Zhiyuan Peng
Experimental results on ZeroSpeech 2017 show that both approaches are effective while the latter is more prominent, and that their combination brings further marginal improvement in across-speaker condition.
no code implementations • 6 Jan 2019 • Yuzhong Wu, Tan Lee
Acoustic scene classification is the task of identifying the scene from which the audio signal is recorded.
no code implementations • 1 Nov 2017 • Yuzhong Wu, Tan Lee
Audio classification is the task of identifying the sound categories that are associated with a given audio signal.
Sound Audio and Speech Processing
no code implementations • 16 Oct 2014 • Haipeng Wang, Tan Lee
This paper describes a spoken keyword search system developed at the Chinese University of Hong Kong (CUHK) for the query by example search on speech (QUESST) task of MediaEval 2014.
Ranked #3 on Keyword Spotting on QUESST