Search Results for author: Sung-Feng Huang

Found 13 papers, 4 papers with code

Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization

no code implementations • 23 Jan 2024 • Wei-Ping Huang, Sung-Feng Huang, Hung-Yi Lee

This paper presents an effective transfer learning framework for language adaptation in text-to-speech systems, with a focus on achieving language adaptation using minimal labeled and unlabeled data.

Transfer Learning

Paper
Add Code

Learning Phone Recognition from Unpaired Audio and Phone Sequences Based on Generative Adversarial Network

no code implementations • 29 Jul 2022 • Da-Rong Liu, Po-chun Hsu, Yi-Chen Chen, Sung-Feng Huang, Shun-Po Chuang, Da-Yi Wu, Hung-Yi Lee

GAN training is adopted in the first stage to find the mapping relationship between unpaired speech and phone sequence.

Acoustic Unit Discovery Generative Adversarial Network

Paper
Add Code

Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding

no code implementations • 27 Jun 2022 • Wei-Ping Huang, Po-Chun Chen, Sung-Feng Huang, Hung-Yi Lee

This paper studies a transferable phoneme embedding framework that aims to deal with the cross-lingual text-to-speech (TTS) problem under the few-shot setting.

Few-Shot Learning Transfer Learning

Paper
Add Code

Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech

1 code implementation • 7 Nov 2021 • Sung-Feng Huang, Chyi-Jiunn Lin, Da-Rong Liu, Yi-Chen Chen, Hung-Yi Lee

On the one hand, speaker adaptation methods fine-tune a trained multi-speaker text-to-speech (TTS) model with few enrolled samples.

Meta-Learning Speech Synthesis

182

Paper
Code

SpeechNet: A Universal Modularized Model for Speech Processing Tasks

1 code implementation • 7 May 2021 • Yi-Chen Chen, Po-Han Chi, Shu-wen Yang, Kai-Wei Chang, Jheng-Hao Lin, Sung-Feng Huang, Da-Rong Liu, Chi-Liang Liu, Cheng-Kuang Lee, Hung-Yi Lee

The multi-task learning of a wide variety of speech processing tasks with a universal model has not been studied.

Multi-Task Learning

Paper
Code

Non-autoregressive Mandarin-English Code-switching Speech Recognition

no code implementations • 6 Apr 2021 • Shun-Po Chuang, Heng-Jui Chang, Sung-Feng Huang, Hung-Yi Lee

Mandarin-English code-switching (CS) is frequently used among East and Southeast Asian people.

Sentence speech-recognition +1

Paper
Add Code

Stabilizing Label Assignment for Speech Separation by Self-supervised Pre-training

1 code implementation • 29 Oct 2020 • Sung-Feng Huang, Shun-Po Chuang, Da-Rong Liu, Yi-Chen Chen, Gene-Ping Yang, Hung-Yi Lee

Speech separation has been well developed, with the very successful permutation invariant training (PIT) approach, although the frequent label assignment switching happening during PIT training remains to be a problem when better convergence speed and achievable performance are desired.

Ranked #6 on Speech Separation on Libri2Mix (using extra training data)

Speaker Separation Speech Enhancement +1

Paper
Code

Pretrained Language Model Embryology: The Birth of ALBERT

1 code implementation • EMNLP 2020 • Cheng-Han Chiang, Sung-Feng Huang, Hung-Yi Lee

These findings suggest that knowledge of a pretrained model varies during pretraining, and having more pretrain steps does not necessarily provide a model with more comprehensive knowledge.

Language Modelling POS +1

Paper
Code

From Semi-supervised to Almost-unsupervised Speech Recognition with Very-low Resource by Jointly Learning Phonetic Structures from Audio and Text Embeddings

no code implementations • 10 Apr 2019 • Yi-Chen Chen, Sung-Feng Huang, Hung-Yi Lee, Lin-shan Lee

However, we note human babies start to learn the language by the sounds (or phonetic structures) of a small number of exemplar words, and "generalize" such knowledge to other words without hearing a large amount of data.

speech-recognition Speech Recognition +1

Paper
Add Code

Improved Audio Embeddings by Adjacency-Based Clustering with Applications in Spoken Term Detection

no code implementations • 7 Nov 2018 • Sung-Feng Huang, Yi-Chen Chen, Hung-Yi Lee, Lin-shan Lee

Embedding audio signal segments into vectors with fixed dimensionality is attractive because all following processing will be easier and more efficient, for example modeling, classifying or indexing.

Clustering