Search Results for author: Guangyan Zhang

Found 10 papers, 1 papers with code

Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech

no code implementations • 31 Jul 2023 • Guangyan Zhang, Thomas Merritt, Manuel Sam Ribeiro, Biel Tura-Vecino, Kayoko Yanagisawa, Kamil Pokora, Abdelhamid Ezzerg, Sebastian Cygert, Ammar Abbas, Piotr Bilinski, Roberto Barra-Chicote, Daniel Korzekwa, Jaime Lorenzo-Trueba

Neural text-to-speech systems are often optimized on L1/L2 losses, which make strong assumptions about the distributions of the target data space.

Acoustic Modelling Speech Synthesis +1

Paper
Add Code

Creating Personalized Synthetic Voices from Post-Glossectomy Speech with Guided Diffusion Models

no code implementations • 27 May 2023 • Yusheng Tian, Guangyan Zhang, Tan Lee

Specifically, a diffusion-based speech synthesis model is trained on original recordings, to capture and preserve the target speaker's original articulation style.

Speech Synthesis Voice Conversion

Paper
Add Code

iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre

no code implementations • 29 Jun 2022 • Guangyan Zhang, Ying Qin, Wenjie Zhang, Jialun Wu, Mei Li, Yutao Gai, Feijun Jiang, Tan Lee

The emotion encoder extracts the identity of emotion type as well as the respective emotion intensity from the mel-spectrogram of input speech.

Disentanglement Speaker Identification +1

Paper
Add Code

Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech

no code implementations • 31 Mar 2022 • Guangyan Zhang, Kaitao Song, Xu Tan, Daxin Tan, Yuzi Yan, Yanqing Liu, Gang Wang, Wei Zhou, Tao Qin, Tan Lee, Sheng Zhao

However, the works apply pre-training with character-based units to enhance the TTS phoneme encoder, which is inconsistent with the TTS fine-tuning that takes phonemes as input.

Paper
Add Code

A study on the efficacy of model pre-training in developing neural text-to-speech system

no code implementations • 8 Oct 2021 • Guangyan Zhang, Yichong Leng, Daxin Tan, Ying Qin, Kaitao Song, Xu Tan, Sheng Zhao, Tan Lee

However, in terms of ultimately achieved system performance for target speaker(s), the actual benefits of model pre-training are uncertain and unstable, depending very much on the quantity and text content of training data.

Computational Efficiency

Paper
Add Code

Environment Aware Text-to-Speech Synthesis

no code implementations • 8 Oct 2021 • Daxin Tan, Guangyan Zhang, Tan Lee

The key idea is to model the acoustic environment in speech audio as a factor of data variability and incorporate it as a condition in the process of neural network based speech synthesis.

Attribute Disentanglement +2

Paper
Add Code

Applying the Information Bottleneck Principle to Prosodic Representation Learning

no code implementations • 5 Aug 2021 • Guangyan Zhang, Ying Qin, Daxin Tan, Tan Lee

This paper describes a novel design of a neural network-based speech generation model for learning prosodic representation. The problem of representation learning is formulated according to the information bottleneck (IB) principle.

Representation Learning

Paper
Add Code

AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style

no code implementations • 6 Jul 2021 • Yuzi Yan, Xu Tan, Bohan Li, Guangyan Zhang, Tao Qin, Sheng Zhao, Yuan Shen, Wei-Qiang Zhang, Tie-Yan Liu

While recent text to speech (TTS) models perform very well in synthesizing reading-style (e. g., audiobook) speech, it is still challenging to synthesize spontaneous-style speech (e. g., podcast or conversation), mainly because of two reasons: 1) the lack of training data for spontaneous speech; 2) the difficulty in modeling the filled pauses (um and uh) and diverse rhythms in spontaneous speech.

Decoder

Paper
Add Code

CUHK-EE Voice Cloning System for ICASSP 2021 M2VoC Challenge

no code implementations • 8 Mar 2021 • Daxin Tan, Hingpang Huang, Guangyan Zhang, Tan Lee

100 and 5 utterances of 3 target speakers in different voice and style are provided in track 1 and 2 respectively, and the participants are required to synthesize speech in target speaker's voice and style.

Voice Cloning

Paper
Add Code

ItLnc-BXE: a Bagging-XGBoost-ensemble method with multiple features for identification of plant lncRNAs

1 code implementation • 1 Nov 2019 • Guangyan Zhang, Ziru Liu, Jichen Dai, Zilan Yu, Shuai Liu, Wen Zhang

However, most of the existing methods are designed for lncRNAs in animal systems, and only a few methods focus on the plant lncRNA identification.

Ensemble Learning feature selection

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.