Search Results for author: Haohan Guo

Found 9 papers, 4 papers with code

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

no code implementations12 Feb 2024 Mateusz Łajszczak, Guillermo Cámbara, Yang Li, Fatih Beyhan, Arent van Korlaar, Fan Yang, Arnaud Joly, Álvaro Martín-Cortinas, Ammar Abbas, Adam Michalski, Alexis Moinet, Sri Karlapati, Ewa Muszyńska, Haohan Guo, Bartosz Putrycz, Soledad López Gambino, Kayeon Yoo, Elena Sokolova, Thomas Drugman

Echoing the widely-reported "emergent abilities" of large language models when trained on increasing volume of data, we show that BASE TTS variants built with 10K+ hours and 500M+ parameters begin to demonstrate natural prosody on textually complex sentences.

Disentanglement

Cross-Speaker Encoding Network for Multi-Talker Speech Recognition

no code implementations8 Jan 2024 Jiawen Kang, Lingwei Meng, Mingyu Cui, Haohan Guo, Xixin Wu, Xunying Liu, Helen Meng

To the best of our knowledge, this work represents an early effort to integrate SIMO and SISO for multi-talker speech recognition.

speech-recognition Speech Recognition

QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning

1 code implementation31 Aug 2023 Haohan Guo, Fenglong Xie, Jiawen Kang, Yujia Xiao, Xixin Wu, Helen Meng

This paper proposes a novel semi-supervised TTS framework, QS-TTS, to improve TTS quality with lower supervised data requirements via Vector-Quantized Self-Supervised Speech Representation Learning (VQ-S3RL) utilizing more unlabeled speech audio.

Representation Learning Speech Synthesis +2

Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations

1 code implementation27 Oct 2022 Haohan Guo, Fenglong Xie, Xixin Wu, Hui Lu, Helen Meng

Moreover, we optimize the training strategy by leveraging more audio to learn MSMCRs better for low-resource languages.

Transfer Learning

A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS

1 code implementation22 Sep 2022 Haohan Guo, Fenglong Xie, Frank K. Soong, Xixin Wu, Helen Meng

A vector-quantized, variational autoencoder (VQ-VAE) based feature analyzer is used to encode Mel spectrograms of speech training data by down-sampling progressively in multiple stages into MSMC Representations (MSMCRs) with different time resolutions, and quantizing them with multiple VQ codebooks, respectively.

Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training

1 code implementation3 Dec 2020 Haohan Guo, Heng Lu, Na Hu, Chunlei Zhang, Shan Yang, Lei Xie, Dan Su, Dong Yu

In order to make timbre conversion more stable and controllable, speaker embedding is further decomposed to the weighted sum of a group of trainable vectors representing different timbre clusters.

Audio Generation Disentanglement +1

Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS

no code implementations9 Apr 2019 Haohan Guo, Frank K. Soong, Lei He, Lei Xie

The end-to-end TTS, which can predict speech directly from a given sequence of graphemes or phonemes, has shown improved performance over the conventional TTS.

Sentence

A New GAN-based End-to-End TTS Training Algorithm

no code implementations9 Apr 2019 Haohan Guo, Frank K. Soong, Lei He, Lei Xie

However, the autoregressive module training is affected by the exposure bias, or the mismatch between the different distributions of real and predicted data.

Generative Adversarial Network Sentence +1

Feature reinforcement with word embedding and parsing information in neural TTS

no code implementations3 Jan 2019 Huaiping Ming, Lei He, Haohan Guo, Frank K. Soong

In this paper, we propose a feature reinforcement method under the sequence-to-sequence neural text-to-speech (TTS) synthesis framework.

Sentence

Cannot find the paper you are looking for? You can Submit a new open access paper.