Search Results for author: Tan Lee

Found 52 papers, 5 papers with code

Creating Personalized Synthetic Voices from Articulation Impaired Speech Using Augmented Reconstruction Loss

no code implementations8 Jan 2024 Yusheng Tian, Jingyu Li, Tan Lee

Experimental results on a real case of tongue cancer patient confirm that the synthetic voice achieves comparable articulation quality to unimpaired natural speech, while effectively maintaining the target speaker's individuality.

Efficient Black-Box Speaker Verification Model Adaptation with Reprogramming and Backend Learning

no code implementations24 Sep 2023 Jingyu Li, Tan Lee

The development of deep neural networks (DNN) has significantly enhanced the performance of speaker verification (SV) systems in recent years.

Domain Adaptation Speaker Verification

CoMFLP: Correlation Measure based Fast Search on ASR Layer Pruning

1 code implementation21 Sep 2023 Wei Liu, Zhiyuan Peng, Tan Lee

The search process is carried out in two steps: (1) coarse search: to determine top $K$ candidates by pruning the most redundant layers based on the correlation matrix; (2) fine search: to select the best pruning proposal among $K$ candidates using a task-specific evaluation metric.

speech-recognition Speech Recognition

ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading

no code implementations3 Jul 2023 Yujia Xiao, Shaofei Zhang, Xi Wang, Xu Tan, Lei He, Sheng Zhao, Frank K. Soong, Tan Lee

Experiments show that ContextSpeech significantly improves the voice quality and prosody expressiveness in paragraph reading with competitive model efficiency.

Sentence

Creating Personalized Synthetic Voices from Post-Glossectomy Speech with Guided Diffusion Models

no code implementations27 May 2023 Yusheng Tian, Guangyan Zhang, Tan Lee

Specifically, a diffusion-based speech synthesis model is trained on original recordings, to capture and preserve the target speaker's original articulation style.

Speech Synthesis Voice Conversion

Learning Representation of Therapist Empathy in Counseling Conversation Using Siamese Hierarchical Attention Network

no code implementations26 May 2023 Dehua Tao, Tan Lee, Harold Chui, Sarah Luk

Conversations with extreme values of empathy rating are used to train a Siamese network based encoder with contrastive loss.

Diffusion-Based Mel-Spectrogram Enhancement for Personalized Speech Synthesis with Found Data

1 code implementation18 May 2023 Yusheng Tian, Wei Liu, Tan Lee

One way to address this problem is to pre-enhance the speech with an enhancement model and then use the enhanced data for text-to-speech (TTS) model training.

Speech Enhancement Speech Synthesis

Leveraging phone-level linguistic-acoustic similarity for utterance-level pronunciation scoring

no code implementations21 Feb 2023 Wei Liu, Kaiqi Fu, Xiaohai Tian, Shuju Shi, Wei Li, Zejun Ma, Tan Lee

Recent studies on pronunciation scoring have explored the effect of introducing phone embeddings as reference pronunciation, but mostly in an implicit manner, i. e., addition or concatenation of reference phone embedding and actual pronunciation of the target phone as the phone-level pronunciation quality representation.

An ASR-free Fluency Scoring Approach with Self-Supervised Learning

no code implementations20 Feb 2023 Wei Liu, Kaiqi Fu, Xiaohai Tian, Shuju Shi, Wei Li, Zejun Ma, Tan Lee

A typical fluency scoring system generally relies on an automatic speech recognition (ASR) system to obtain time stamps in input speech for either the subsequent calculation of fluency-related features or directly modeling speech fluency with an end-to-end approach.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Convolution-Based Channel-Frequency Attention for Text-Independent Speaker Verification

no code implementations31 Oct 2022 Jingyu Li, Yusheng Tian, Tan Lee

The weights are imposed on the input features to improve the representation ability for speaker modeling.

Text-Independent Speaker Verification

iExam: A Novel Online Exam Monitoring and Analysis System Based on Face Detection and Recognition

1 code implementation27 Jun 2022 Xu Yang, Daoyuan Wu, Xiao Yi, Jimmy H. M. Lee, Tan Lee

In this paper, we propose iExam, an intelligent online exam monitoring and analysis system that can not only use face detection to assist invigilators in real-time student identification, but also be able to detect common abnormal behaviors (including face disappearing, rotating faces, and replacing with a different person during the exams) via a face recognition-based post-exam video analysis.

Face Detection Face Recognition +2

Transport-Oriented Feature Aggregation for Speaker Embedding Learning

no code implementations26 Jun 2022 Yusheng Tian, Jingyu Li, Tan Lee

Pooling is needed to aggregate frame-level features into utterance-level representations for speaker modeling.

Speaker Verification

EDITnet: A Lightweight Network for Unsupervised Domain Adaptation in Speaker Verification

no code implementations15 Jun 2022 Jingyu Li, Wei Liu, Tan Lee

This paper proposes a domain transfer network, named EDITnet, to alleviate the language-mismatch problem on speaker embeddings without requiring speaker labels.

Self-Supervised Learning Speaker Verification +1

Learnable Frequency Filters for Speech Feature Extraction in Speaker Verification

no code implementations15 Jun 2022 Jingyu Li, Yusheng Tian, Tan Lee

There is no reason to expect that these features are optimal for all different tasks, including speaker verification (SV).

Speaker Verification

Multivariate Empirical Mode Decomposition of EEG for Mental State Detection at Localized Brain Lobes

no code implementations2 Jun 2022 Monira Islam, Tan Lee

In this study, the Multivariate Empirical Mode Decomposition (MEMD) approach is applied to extract features from multi-channel EEG signals for mental state classification.

EEG

MEMD-HHT based Emotion Detection from EEG using 3D CNN

no code implementations2 Jun 2022 Monira Islam, Tan Lee

In this study, the Multivariate Empirical Mode Decomposition (MEMD) is applied to multichannel EEG to obtain scale-aligned intrinsic mode functions (IMFs) as input features for emotion detection.

Binary Classification EEG

An Investigation on Applying Acoustic Feature Conversion to ASR of Adult and Child Speech

no code implementations25 May 2022 Wei Liu, Jingyu Li, Tan Lee

The performance of child speech recognition is generally less satisfactory compared to adult speech due to limited amount of training data.

Attribute Automatic Speech Recognition +4

Hierarchical Attention Network for Evaluating Therapist Empathy in Counseling Session

no code implementations31 Mar 2022 Dehua Tao, Tan Lee, Harold Chui, Sarah Luk

Counseling typically takes the form of spoken conversation between a therapist and a client.

Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech

no code implementations31 Mar 2022 Guangyan Zhang, Kaitao Song, Xu Tan, Daxin Tan, Yuzi Yan, Yanqing Liu, Gang Wang, Wei Zhou, Tao Qin, Tan Lee, Sheng Zhao

However, the works apply pre-training with character-based units to enhance the TTS phoneme encoder, which is inconsistent with the TTS fine-tuning that takes phonemes as input.

Automatic Detection of Speech Sound Disorder in Child Speech Using Posterior-based Speaker Representations

no code implementations29 Mar 2022 Si-Ioi Ng, Cymie Wing-Yee Ng, Jiarui Wang, Tan Lee

This paper presents a macroscopic approach to automatic detection of speech sound disorder (SSD) in child speech.

Speaker Verification

Characterizing Therapist's Speaking Style in Relation to Empathy in Psychotherapy

no code implementations24 Mar 2022 Dehua Tao, Tan Lee, Harold Chui, Sarah Luk

Using the salient utterance genres, an accuracy of $71\%$ is achieved in classifying psychotherapy sessions into ``high" and ``low" empathy level.

Relation

Acoustical Analysis of Speech Under Physical Stress in Relation to Physical Activities and Physical Literacy

no code implementations20 Nov 2021 Si-Ioi Ng, Rui-Si Ma, Tan Lee, Raymond Kim-Wai Sum

The general wellness of a person is further related to his/her physical literacy (PL), which refers to a holistic description of engagement in PA.

Relation

Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition

no code implementations9 Oct 2021 Si-Ioi Ng, Tan Lee

The underlying objective is to explore the feasibility of deploying LTR speech in the training of end-to-end (E2E) ASR models, as an attempt to data augmentation for improving the recognition performance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Environment Aware Text-to-Speech Synthesis

no code implementations8 Oct 2021 Daxin Tan, Guangyan Zhang, Tan Lee

The key idea is to model the acoustic environment in speech audio as a factor of data variability and incorporate it as a condition in the process of neural network based speech synthesis.

Attribute Disentanglement +2

A study on the efficacy of model pre-training in developing neural text-to-speech system

no code implementations8 Oct 2021 Guangyan Zhang, Yichong Leng, Daxin Tan, Ying Qin, Kaitao Song, Xu Tan, Sheng Zhao, Tan Lee

However, in terms of ultimately achieved system performance for target speaker(s), the actual benefits of model pre-training are uncertain and unstable, depending very much on the quantity and text content of training data.

Computational Efficiency

Improving Text-Independent Speaker Verification with Auxiliary Speakers Using Graph

no code implementations20 Sep 2021 Jingyu Li, Si-Ioi Ng, Tan Lee

Given the embeddings from a pair of input utterances, a graph model is designed to incorporate additional information from a group of embeddings representing the so-called auxiliary speakers.

Text-Independent Speaker Verification

Utterance-level neural confidence measure for end-to-end children speech recognition

no code implementations16 Sep 2021 Wei Liu, Tan Lee

The investigation is focused on evaluating and comparing the efficacies of predictor features that are derived from different internal and external modules of the E2E system.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Applying the Information Bottleneck Principle to Prosodic Representation Learning

no code implementations5 Aug 2021 Guangyan Zhang, Ying Qin, Daxin Tan, Tan Lee

This paper describes a novel design of a neural network-based speech generation model for learning prosodic representation. The problem of representation learning is formulated according to the information bottleneck (IB) principle.

Representation Learning

EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion

1 code implementation4 Jul 2021 Daxin Tan, Liqun Deng, Yu Ting Yeung, Xin Jiang, Xiao Chen, Tan Lee

This paper presents the design, implementation and evaluation of a speech editing system, named EditSpeech, which allows a user to perform deletion, insertion and replacement of words in a given speech utterance, without causing audible degradation in speech quality and naturalness.

Detection of Consonant Errors in Disordered Speech Based on Consonant-vowel Segment Embedding

no code implementations16 Jun 2021 Si-Ioi Ng, Cymie Wing-Yee Ng, Jingyu Li, Tan Lee

This paper investigates a neural network based approach to detecting consonant errors in disordered speech using consonant-vowel (CV) diphone segment in comparison to using consonant monophone segment.

Enhancing Segment-Based Speech Emotion Recognition by Deep Self-Learning

no code implementations30 Mar 2021 Shuiyang Mao, P. C. Ching, Tan Lee

Despite the widespread utilization of deep neural networks (DNNs) for speech emotion recognition (SER), they are severely restricted due to the paucity of labeled data for training.

Self-Learning Speech Emotion Recognition

CUHK-EE Voice Cloning System for ICASSP 2021 M2VoC Challenge

no code implementations8 Mar 2021 Daxin Tan, Hingpang Huang, Guangyan Zhang, Tan Lee

100 and 5 utterances of 3 target speakers in different voice and style are provided in track 1 and 2 respectively, and the participants are required to synthesize speech in target speaker's voice and style.

Voice Cloning

Bayesian Learning for Deep Neural Network Adaptation

1 code implementation14 Dec 2020 Xurong Xie, Xunying Liu, Tan Lee, Lan Wang

A key task for speech recognition systems is to reduce the mismatch between training and evaluation data that is often attributable to speaker differences.

speech-recognition Speech Recognition +1

Unsupervised Spoken Term Discovery Based on Re-clustering of Hypothesized Speech Segments with Siamese and Triplet Networks

no code implementations28 Nov 2020 Man-Ling Sung, Tan Lee

The Siamese/Triplet network is trained on the hypothesized examples to measure the similarity between two speech segments and hereby perform re-clustering of all hypothesized subword sequences to achieve spoken term discovery.

Clustering

Unsupervised Pattern Discovery from Thematic Speech Archives Based on Multilingual Bottleneck Features

no code implementations3 Nov 2020 Man-Ling Sung, Siyuan Feng, Tan Lee

With the unsupervisedly trained acoustic models, a given audio archive is represented by a pseudo transcription, from which spoken keywords can be discovered by string mining algorithms.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Mixture factorized auto-encoder for unsupervised hierarchical deep factorization of speech signal

no code implementations30 Oct 2019 Zhiyuan Peng, Siyuan Feng, Tan Lee

The USM experiments on ZeroSpeech 2017 dataset verify that the frame tokenizer is able to capture linguistic content and the utterance embedder can acquire speaker-related information.

Clustering Speaker Verification

Exploiting Cross-Lingual Speaker and Phonetic Diversity for Unsupervised Subword Modeling

no code implementations9 Aug 2019 Siyuan Feng, Tan Lee

Out-of-domain ASR systems can be applied to perform speaker adaptation with untranscribed training data of the target language, and to decode the training speech into frame-level labels for DNN training.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Improving Unsupervised Subword Modeling via Disentangled Speech Representation Learning and Transformation

no code implementations17 Jun 2019 Siyuan Feng, Tan Lee

This study tackles unsupervised subword modeling in the zero-resource scenario, learning frame-level speech representation that is phonetically discriminative and speaker-invariant, using only untranscribed speech for target languages.

Clustering Representation Learning

Combining Adversarial Training and Disentangled Speech Representation for Robust Zero-Resource Subword Modeling

no code implementations17 Jun 2019 Siyuan Feng, Tan Lee, Zhiyuan Peng

Experimental results on ZeroSpeech 2017 show that both approaches are effective while the latter is more prominent, and that their combination brings further marginal improvement in across-speaker condition.

Representation Learning

Enhancing Sound Texture in CNN-Based Acoustic Scene Classification

no code implementations6 Jan 2019 Yuzhong Wu, Tan Lee

Acoustic scene classification is the task of identifying the scene from which the audio signal is recorded.

Acoustic Scene Classification Classification +2

Reducing Model Complexity for DNN Based Large-Scale Audio Classification

no code implementations1 Nov 2017 Yuzhong Wu, Tan Lee

Audio classification is the task of identifying the sound categories that are associated with a given audio signal.

Sound Audio and Speech Processing

CUHK System for QUESST Task of MediaEval 2014

no code implementations16 Oct 2014 Haipeng Wang, Tan Lee

This paper describes a spoken keyword search system developed at the Chinese University of Hong Kong (CUHK) for the query by example search on speech (QUESST) task of MediaEval 2014.

Clustering Dynamic Time Warping +1

Cannot find the paper you are looking for? You can Submit a new open access paper.