Search Results for author: Sung-Feng Huang

Found 13 papers, 4 papers with code

Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization

no code implementations23 Jan 2024 Wei-Ping Huang, Sung-Feng Huang, Hung-Yi Lee

This paper presents an effective transfer learning framework for language adaptation in text-to-speech systems, with a focus on achieving language adaptation using minimal labeled and unlabeled data.

Transfer Learning

Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding

no code implementations27 Jun 2022 Wei-Ping Huang, Po-Chun Chen, Sung-Feng Huang, Hung-Yi Lee

This paper studies a transferable phoneme embedding framework that aims to deal with the cross-lingual text-to-speech (TTS) problem under the few-shot setting.

Few-Shot Learning Transfer Learning

Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech

1 code implementation7 Nov 2021 Sung-Feng Huang, Chyi-Jiunn Lin, Da-Rong Liu, Yi-Chen Chen, Hung-Yi Lee

On the one hand, speaker adaptation methods fine-tune a trained multi-speaker text-to-speech (TTS) model with few enrolled samples.

Meta-Learning Speech Synthesis

Stabilizing Label Assignment for Speech Separation by Self-supervised Pre-training

1 code implementation29 Oct 2020 Sung-Feng Huang, Shun-Po Chuang, Da-Rong Liu, Yi-Chen Chen, Gene-Ping Yang, Hung-Yi Lee

Speech separation has been well developed, with the very successful permutation invariant training (PIT) approach, although the frequent label assignment switching happening during PIT training remains to be a problem when better convergence speed and achievable performance are desired.

Ranked #6 on Speech Separation on Libri2Mix (using extra training data)

Speaker Separation Speech Enhancement +1

Pretrained Language Model Embryology: The Birth of ALBERT

1 code implementation EMNLP 2020 Cheng-Han Chiang, Sung-Feng Huang, Hung-Yi Lee

These findings suggest that knowledge of a pretrained model varies during pretraining, and having more pretrain steps does not necessarily provide a model with more comprehensive knowledge.

Language Modelling POS +1

From Semi-supervised to Almost-unsupervised Speech Recognition with Very-low Resource by Jointly Learning Phonetic Structures from Audio and Text Embeddings

no code implementations10 Apr 2019 Yi-Chen Chen, Sung-Feng Huang, Hung-Yi Lee, Lin-shan Lee

However, we note human babies start to learn the language by the sounds (or phonetic structures) of a small number of exemplar words, and "generalize" such knowledge to other words without hearing a large amount of data.

speech-recognition Speech Recognition +1

Improved Audio Embeddings by Adjacency-Based Clustering with Applications in Spoken Term Detection

no code implementations7 Nov 2018 Sung-Feng Huang, Yi-Chen Chen, Hung-Yi Lee, Lin-shan Lee

Embedding audio signal segments into vectors with fixed dimensionality is attractive because all following processing will be easier and more efficient, for example modeling, classifying or indexing.

Clustering

Cannot find the paper you are looking for? You can Submit a new open access paper.