Search Results for author: Tan Lee

Found 52 papers, 5 papers with code

LUPET: Incorporating Hierarchical Information Path into Multilingual ASR

no code implementations • 8 Jan 2024 • Wei Liu, Jingyong Hou, Dong Yang, Muyong Cao, Tan Lee

Many factors have separately shown their effectiveness on improving multilingual ASR.

Paper
Add Code

Creating Personalized Synthetic Voices from Articulation Impaired Speech Using Augmented Reconstruction Loss

no code implementations • 8 Jan 2024 • Yusheng Tian, Jingyu Li, Tan Lee

Experimental results on a real case of tongue cancer patient confirm that the synthetic voice achieves comparable articulation quality to unimpaired natural speech, while effectively maintaining the target speaker's individuality.

Paper
Add Code

Modeling Intrapersonal and Interpersonal Influences for Automatic Estimation of Therapist Empathy in Counseling Conversation

no code implementations • 22 Oct 2023 • Dehua Tao, Tan Lee, Harold Chui, Sarah Luk

Counseling is usually conducted through spoken conversation between a therapist and a client.

Paper
Add Code

A Study on Prosodic Entrainment in Relation to Therapist Empathy in Counseling Conversation

no code implementations • 22 Oct 2023 • Dehua Tao, Tan Lee, Harold Chui, Sarah Luk

Counseling is carried out as spoken conversation between a therapist and a client.

Relation

Paper
Add Code

Efficient Black-Box Speaker Verification Model Adaptation with Reprogramming and Backend Learning

no code implementations • 24 Sep 2023 • Jingyu Li, Tan Lee

The development of deep neural networks (DNN) has significantly enhanced the performance of speaker verification (SV) systems in recent years.

Domain Adaptation Speaker Verification

Paper
Add Code

Sparsely Shared LoRA on Whisper for Child Speech Recognition

no code implementations • 21 Sep 2023 • Wei Liu, Ying Qin, Zhiyuan Peng, Tan Lee

Child speech, as a representative type of low-resource speech, is leveraged for adaptation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

CoMFLP: Correlation Measure based Fast Search on ASR Layer Pruning

1 code implementation • 21 Sep 2023 • Wei Liu, Zhiyuan Peng, Tan Lee

The search process is carried out in two steps: (1) coarse search: to determine top $K$ candidates by pruning the most redundant layers based on the correlation matrix; (2) fine search: to select the best pruning proposal among $K$ candidates using a task-specific evaluation metric.

speech-recognition Speech Recognition

Paper
Code

ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading

no code implementations • 3 Jul 2023 • Yujia Xiao, Shaofei Zhang, Xi Wang, Xu Tan, Lei He, Sheng Zhao, Frank K. Soong, Tan Lee

Experiments show that ContextSpeech significantly improves the voice quality and prosody expressiveness in paragraph reading with competitive model efficiency.

Sentence

Paper
Add Code

Creating Personalized Synthetic Voices from Post-Glossectomy Speech with Guided Diffusion Models

no code implementations • 27 May 2023 • Yusheng Tian, Guangyan Zhang, Tan Lee

Specifically, a diffusion-based speech synthesis model is trained on original recordings, to capture and preserve the target speaker's original articulation style.

Speech Synthesis Voice Conversion

Paper
Add Code

Learning Representation of Therapist Empathy in Counseling Conversation Using Siamese Hierarchical Attention Network

no code implementations • 26 May 2023 • Dehua Tao, Tan Lee, Harold Chui, Sarah Luk

Conversations with extreme values of empathy rating are used to train a Siamese network based encoder with contrastive loss.

Paper
Add Code

Diffusion-Based Mel-Spectrogram Enhancement for Personalized Speech Synthesis with Found Data

1 code implementation • 18 May 2023 • Yusheng Tian, Wei Liu, Tan Lee

One way to address this problem is to pre-enhance the speech with an enhancement model and then use the enhanced data for text-to-speech (TTS) model training.

Speech Enhancement Speech Synthesis

Paper
Code

Leveraging phone-level linguistic-acoustic similarity for utterance-level pronunciation scoring

no code implementations • 21 Feb 2023 • Wei Liu, Kaiqi Fu, Xiaohai Tian, Shuju Shi, Wei Li, Zejun Ma, Tan Lee

Recent studies on pronunciation scoring have explored the effect of introducing phone embeddings as reference pronunciation, but mostly in an implicit manner, i. e., addition or concatenation of reference phone embedding and actual pronunciation of the target phone as the phone-level pronunciation quality representation.

Paper
Add Code

An ASR-free Fluency Scoring Approach with Self-Supervised Learning

no code implementations • 20 Feb 2023 • Wei Liu, Kaiqi Fu, Xiaohai Tian, Shuju Shi, Wei Li, Zejun Ma, Tan Lee

A typical fluency scoring system generally relies on an automatic speech recognition (ASR) system to obtain time stamps in input speech for either the subsequent calculation of fluency-related features or directly modeling speech fluency with an end-to-end approach.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Model Compression for DNN-based Speaker Verification Using Weight Quantization

no code implementations • 31 Oct 2022 • Jingyu Li, Wei Liu, Zhaoyang Zhang, Jiong Wang, Tan Lee

Experimental results on VoxCeleb show that weight quantization is effective for compressing SV models.

Model Compression Quantization +1

Paper
Add Code

Convolution-Based Channel-Frequency Attention for Text-Independent Speaker Verification

no code implementations • 31 Oct 2022 • Jingyu Li, Yusheng Tian, Tan Lee

The weights are imposed on the input features to improve the representation ability for speaker modeling.

Text-Independent Speaker Verification

Paper
Add Code

iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre

no code implementations • 29 Jun 2022 • Guangyan Zhang, Ying Qin, Wenjie Zhang, Jialun Wu, Mei Li, Yutao Gai, Feijun Jiang, Tan Lee

The emotion encoder extracts the identity of emotion type as well as the respective emotion intensity from the mel-spectrogram of input speech.

Disentanglement Speaker Identification +1

Paper
Add Code

iExam: A Novel Online Exam Monitoring and Analysis System Based on Face Detection and Recognition

1 code implementation • 27 Jun 2022 • Xu Yang, Daoyuan Wu, Xiao Yi, Jimmy H. M. Lee, Tan Lee

In this paper, we propose iExam, an intelligent online exam monitoring and analysis system that can not only use face detection to assist invigilators in real-time student identification, but also be able to detect common abnormal behaviors (including face disappearing, rotating faces, and replacing with a different person during the exams) via a face recognition-based post-exam video analysis.

Face Detection Face Recognition +2

Paper
Code

Transport-Oriented Feature Aggregation for Speaker Embedding Learning

no code implementations • 26 Jun 2022 • Yusheng Tian, Jingyu Li, Tan Lee

Pooling is needed to aggregate frame-level features into utterance-level representations for speaker modeling.

Speaker Verification

Paper
Add Code

EDITnet: A Lightweight Network for Unsupervised Domain Adaptation in Speaker Verification

no code implementations • 15 Jun 2022 • Jingyu Li, Wei Liu, Tan Lee

This paper proposes a domain transfer network, named EDITnet, to alleviate the language-mismatch problem on speaker embeddings without requiring speaker labels.

Self-Supervised Learning Speaker Verification +1

Paper
Add Code

Learnable Frequency Filters for Speech Feature Extraction in Speaker Verification

no code implementations • 15 Jun 2022 • Jingyu Li, Yusheng Tian, Tan Lee

There is no reason to expect that these features are optimal for all different tasks, including speaker verification (SV).

Speaker Verification

Paper
Add Code

Multivariate Empirical Mode Decomposition of EEG for Mental State Detection at Localized Brain Lobes

no code implementations • 2 Jun 2022 • Monira Islam, Tan Lee

In this study, the Multivariate Empirical Mode Decomposition (MEMD) approach is applied to extract features from multi-channel EEG signals for mental state classification.

EEG

Paper
Add Code

MEMD-HHT based Emotion Detection from EEG using 3D CNN

no code implementations • 2 Jun 2022 • Monira Islam, Tan Lee

In this study, the Multivariate Empirical Mode Decomposition (MEMD) is applied to multichannel EEG to obtain scale-aligned intrinsic mode functions (IMFs) as input features for emotion detection.

Binary Classification EEG

Paper
Add Code

An Investigation on Applying Acoustic Feature Conversion to ASR of Adult and Child Speech

no code implementations • 25 May 2022 • Wei Liu, Jingyu Li, Tan Lee

The performance of child speech recognition is generally less satisfactory compared to adult speech due to limited amount of training data.

Attribute Automatic Speech Recognition +4

Paper
Add Code

CorrectSpeech: A Fully Automated System for Speech Correction and Accent Reduction

no code implementations • 12 Apr 2022 • Daxin Tan, Liqun Deng, Nianzu Zheng, Yu Ting Yeung, Xin Jiang, Xiao Chen, Tan Lee

This study propose a fully automated system for speech correction and accent reduction.

speech-recognition Speech Recognition

Paper
Add Code

Hierarchical Attention Network for Evaluating Therapist Empathy in Counseling Session

no code implementations • 31 Mar 2022 • Dehua Tao, Tan Lee, Harold Chui, Sarah Luk

Counseling typically takes the form of spoken conversation between a therapist and a client.

Paper
Add Code

Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech

no code implementations • 31 Mar 2022 • Guangyan Zhang, Kaitao Song, Xu Tan, Daxin Tan, Yuzi Yan, Yanqing Liu, Gang Wang, Wei Zhou, Tao Qin, Tan Lee, Sheng Zhao

However, the works apply pre-training with character-based units to enhance the TTS phoneme encoder, which is inconsistent with the TTS fine-tuning that takes phonemes as input.

Paper
Add Code

Automatic Detection of Speech Sound Disorder in Child Speech Using Posterior-based Speaker Representations

no code implementations • 29 Mar 2022 • Si-Ioi Ng, Cymie Wing-Yee Ng, Jiarui Wang, Tan Lee

This paper presents a macroscopic approach to automatic detection of speech sound disorder (SSD) in child speech.

Speaker Verification

Paper
Add Code

Characterizing Therapist's Speaking Style in Relation to Empathy in Psychotherapy

no code implementations • 24 Mar 2022 • Dehua Tao, Tan Lee, Harold Chui, Sarah Luk

Using the salient utterance genres, an accuracy of $71\%$ is achieved in classifying psychotherapy sessions into ``high" and ``low" empathy level.

Relation

Paper
Add Code

Acoustical Analysis of Speech Under Physical Stress in Relation to Physical Activities and Physical Literacy

no code implementations • 20 Nov 2021 • Si-Ioi Ng, Rui-Si Ma, Tan Lee, Raymond Kim-Wai Sum

The general wellness of a person is further related to his/her physical literacy (PL), which refers to a holistic description of engagement in PA.

Relation

Paper
Add Code

Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition

no code implementations • 9 Oct 2021 • Si-Ioi Ng, Tan Lee

The underlying objective is to explore the feasibility of deploying LTR speech in the training of end-to-end (E2E) ASR models, as an attempt to data augmentation for improving the recognition performance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Environment Aware Text-to-Speech Synthesis

no code implementations • 8 Oct 2021 • Daxin Tan, Guangyan Zhang, Tan Lee

The key idea is to model the acoustic environment in speech audio as a factor of data variability and incorporate it as a condition in the process of neural network based speech synthesis.

Attribute Disentanglement +2

Paper
Add Code

A study on the efficacy of model pre-training in developing neural text-to-speech system

no code implementations • 8 Oct 2021 • Guangyan Zhang, Yichong Leng, Daxin Tan, Ying Qin, Kaitao Song, Xu Tan, Sheng Zhao, Tan Lee

However, in terms of ultimately achieved system performance for target speaker(s), the actual benefits of model pre-training are uncertain and unstable, depending very much on the quantity and text content of training data.

Computational Efficiency

Paper
Add Code

Exploiting Pre-Trained ASR Models for Alzheimer's Disease Recognition Through Spontaneous Speech

no code implementations • 4 Oct 2021 • Ying Qin, Wei Liu, Zhiyuan Peng, Si-Ioi Ng, Jingyu Li, Haibo Hu, Tan Lee

Input to these classifiers are speech transcripts produced by automatic speech recognition (ASR) models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Improving Text-Independent Speaker Verification with Auxiliary Speakers Using Graph

no code implementations • 20 Sep 2021 • Jingyu Li, Si-Ioi Ng, Tan Lee

Given the embeddings from a pair of input utterances, a graph model is designed to incorporate additional information from a group of embeddings representing the so-called auxiliary speakers.

Text-Independent Speaker Verification

Paper
Add Code

Utterance-level neural confidence measure for end-to-end children speech recognition

no code implementations • 16 Sep 2021 • Wei Liu, Tan Lee

The investigation is focused on evaluating and comparing the efficacies of predictor features that are derived from different internal and external modules of the E2E system.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Robust Feature Learning on Long-Duration Sounds for Acoustic Scene Classification

no code implementations • 11 Aug 2021 • Yuzhong Wu, Tan Lee

For a more robust ASC system, We propose a robust feature learning (RFL) framework to train the CNN.

Acoustic Scene Classification Scene Classification

Paper
Add Code

Applying the Information Bottleneck Principle to Prosodic Representation Learning

no code implementations • 5 Aug 2021 • Guangyan Zhang, Ying Qin, Daxin Tan, Tan Lee

This paper describes a novel design of a neural network-based speech generation model for learning prosodic representation. The problem of representation learning is formulated according to the information bottleneck (IB) principle.

Representation Learning

Paper
Add Code

EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion

1 code implementation • 4 Jul 2021 • Daxin Tan, Liqun Deng, Yu Ting Yeung, Xin Jiang, Xiao Chen, Tan Lee

This paper presents the design, implementation and evaluation of a speech editing system, named EditSpeech, which allows a user to perform deletion, insertion and replacement of words in a given speech utterance, without causing audible degradation in speech quality and naturalness.

163

Paper
Code

Detection of Consonant Errors in Disordered Speech Based on Consonant-vowel Segment Embedding

no code implementations • 16 Jun 2021 • Si-Ioi Ng, Cymie Wing-Yee Ng, Jingyu Li, Tan Lee

This paper investigates a neural network based approach to detecting consonant errors in disordered speech using consonant-vowel (CV) diphone segment in comparison to using consonant monophone segment.

Paper
Add Code

Enhancing Segment-Based Speech Emotion Recognition by Deep Self-Learning

no code implementations • 30 Mar 2021 • Shuiyang Mao, P. C. Ching, Tan Lee

Despite the widespread utilization of deep neural networks (DNNs) for speech emotion recognition (SER), they are severely restricted due to the paucity of labeled data for training.

Self-Learning Speech Emotion Recognition

Paper
Add Code

CUHK-EE Voice Cloning System for ICASSP 2021 M2VoC Challenge

no code implementations • 8 Mar 2021 • Daxin Tan, Hingpang Huang, Guangyan Zhang, Tan Lee

100 and 5 utterances of 3 target speakers in different voice and style are provided in track 1 and 2 respectively, and the participants are required to synthesize speech in target speaker's voice and style.

Voice Cloning

Paper
Add Code

Bayesian Learning for Deep Neural Network Adaptation

1 code implementation • 14 Dec 2020 • Xurong Xie, Xunying Liu, Tan Lee, Lan Wang

A key task for speech recognition systems is to reduce the mismatch between training and evaluation data that is often attributable to speaker differences.

speech-recognition Speech Recognition +1

Paper
Code

Unsupervised Spoken Term Discovery Based on Re-clustering of Hypothesized Speech Segments with Siamese and Triplet Networks

no code implementations • 28 Nov 2020 • Man-Ling Sung, Tan Lee

The Siamese/Triplet network is trained on the hypothesized examples to measure the similarity between two speech segments and hereby perform re-clustering of all hypothesized subword sequences to achieve spoken term discovery.

Clustering

Paper
Add Code

Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement

no code implementations • 8 Nov 2020 • Daxin Tan, Tan Lee

By incorporating a style predictor, the proposed system can also be used for text-to-speech synthesis.

Disentanglement Speech Synthesis +2

Paper
Add Code

Unsupervised Pattern Discovery from Thematic Speech Archives Based on Multilingual Bottleneck Features

no code implementations • 3 Nov 2020 • Man-Ling Sung, Siyuan Feng, Tan Lee

With the unsupervisedly trained acoustic models, a given audio archive is represented by a pseudo transcription, from which spoken keywords can be discovered by string mining algorithms.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Mixture factorized auto-encoder for unsupervised hierarchical deep factorization of speech signal

no code implementations • 30 Oct 2019 • Zhiyuan Peng, Siyuan Feng, Tan Lee

The USM experiments on ZeroSpeech 2017 dataset verify that the frame tokenizer is able to capture linguistic content and the utterance embedder can acquire speaker-related information.

Clustering Speaker Verification

Paper
Add Code

Exploiting Cross-Lingual Speaker and Phonetic Diversity for Unsupervised Subword Modeling

no code implementations • 9 Aug 2019 • Siyuan Feng, Tan Lee

Out-of-domain ASR systems can be applied to perform speaker adaptation with untranscribed training data of the target language, and to decode the training speech into frame-level labels for DNN training.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Improving Unsupervised Subword Modeling via Disentangled Speech Representation Learning and Transformation

no code implementations • 17 Jun 2019 • Siyuan Feng, Tan Lee

This study tackles unsupervised subword modeling in the zero-resource scenario, learning frame-level speech representation that is phonetically discriminative and speaker-invariant, using only untranscribed speech for target languages.

Clustering Representation Learning

Paper
Add Code

Combining Adversarial Training and Disentangled Speech Representation for Robust Zero-Resource Subword Modeling

no code implementations • 17 Jun 2019 • Siyuan Feng, Tan Lee, Zhiyuan Peng

Experimental results on ZeroSpeech 2017 show that both approaches are effective while the latter is more prominent, and that their combination brings further marginal improvement in across-speaker condition.

Representation Learning

Paper
Add Code

Enhancing Sound Texture in CNN-Based Acoustic Scene Classification

no code implementations • 6 Jan 2019 • Yuzhong Wu, Tan Lee

Acoustic scene classification is the task of identifying the scene from which the audio signal is recorded.

Acoustic Scene Classification Classification +2

Paper
Add Code

Reducing Model Complexity for DNN Based Large-Scale Audio Classification

no code implementations • 1 Nov 2017 • Yuzhong Wu, Tan Lee

Audio classification is the task of identifying the sound categories that are associated with a given audio signal.

Sound Audio and Speech Processing

Paper
Add Code

CUHK System for QUESST Task of MediaEval 2014

no code implementations • 16 Oct 2014 • Haipeng Wang, Tan Lee

This paper describes a spoken keyword search system developed at the Chinese University of Hong Kong (CUHK) for the query by example search on speech (QUESST) task of MediaEval 2014.

Ranked #3 on Keyword Spotting on QUESST

Clustering Dynamic Time Warping +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.