no code implementations • 23 Jan 2024 • Wei-Ping Huang, Sung-Feng Huang, Hung-Yi Lee
This paper presents an effective transfer learning framework for language adaptation in text-to-speech systems, with a focus on achieving language adaptation using minimal labeled and unlabeled data.
no code implementations • 29 Jul 2022 • Da-Rong Liu, Po-chun Hsu, Yi-Chen Chen, Sung-Feng Huang, Shun-Po Chuang, Da-Yi Wu, Hung-Yi Lee
GAN training is adopted in the first stage to find the mapping relationship between unpaired speech and phone sequence.
no code implementations • 27 Jun 2022 • Wei-Ping Huang, Po-Chun Chen, Sung-Feng Huang, Hung-Yi Lee
This paper studies a transferable phoneme embedding framework that aims to deal with the cross-lingual text-to-speech (TTS) problem under the few-shot setting.
1 code implementation • 7 Nov 2021 • Sung-Feng Huang, Chyi-Jiunn Lin, Da-Rong Liu, Yi-Chen Chen, Hung-Yi Lee
On the one hand, speaker adaptation methods fine-tune a trained multi-speaker text-to-speech (TTS) model with few enrolled samples.
1 code implementation • 7 May 2021 • Yi-Chen Chen, Po-Han Chi, Shu-wen Yang, Kai-Wei Chang, Jheng-Hao Lin, Sung-Feng Huang, Da-Rong Liu, Chi-Liang Liu, Cheng-Kuang Lee, Hung-Yi Lee
The multi-task learning of a wide variety of speech processing tasks with a universal model has not been studied.
no code implementations • 6 Apr 2021 • Shun-Po Chuang, Heng-Jui Chang, Sung-Feng Huang, Hung-Yi Lee
Mandarin-English code-switching (CS) is frequently used among East and Southeast Asian people.
1 code implementation • 29 Oct 2020 • Sung-Feng Huang, Shun-Po Chuang, Da-Rong Liu, Yi-Chen Chen, Gene-Ping Yang, Hung-Yi Lee
Speech separation has been well developed, with the very successful permutation invariant training (PIT) approach, although the frequent label assignment switching happening during PIT training remains to be a problem when better convergence speed and achievable performance are desired.
Ranked #6 on Speech Separation on Libri2Mix (using extra training data)
1 code implementation • EMNLP 2020 • Cheng-Han Chiang, Sung-Feng Huang, Hung-Yi Lee
These findings suggest that knowledge of a pretrained model varies during pretraining, and having more pretrain steps does not necessarily provide a model with more comprehensive knowledge.
no code implementations • 10 Apr 2019 • Yi-Chen Chen, Sung-Feng Huang, Hung-Yi Lee, Lin-shan Lee
However, we note human babies start to learn the language by the sounds (or phonetic structures) of a small number of exemplar words, and "generalize" such knowledge to other words without hearing a large amount of data.
no code implementations • 7 Nov 2018 • Sung-Feng Huang, Yi-Chen Chen, Hung-Yi Lee, Lin-shan Lee
Embedding audio signal segments into vectors with fixed dimensionality is attractive because all following processing will be easier and more efficient, for example modeling, classifying or indexing.
no code implementations • 30 Oct 2018 • Yi-Chen Chen, Chia-Hao Shen, Sung-Feng Huang, Hung-Yi Lee, Lin-shan Lee
This can be learned by aligning a small number of spoken words and the corresponding text words in the embedding spaces.
no code implementations • 21 Jul 2018 • Yi-Chen Chen, Sung-Feng Huang, Chia-Hao Shen, Hung-Yi Lee, Lin-shan Lee
Stage 1 performs phonetic embedding with speaker characteristics disentangled.
no code implementations • 29 Mar 2018 • Yi-Chen Chen, Chia-Hao Shen, Sung-Feng Huang, Hung-Yi Lee
In this work, we propose a framework to achieve unsupervised ASR on a read English speech dataset, where audio and text are unaligned.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1