Text-to-Speech Models

Glow-TTS is a flow-based generative model for parallel TTS that does not require any external aligner. By combining the properties of flows and dynamic programming, the proposed model searches for the most probable monotonic alignment between text and the latent representation of speech. The model is directly trained to maximize the log-likelihood of speech with the alignment. Enforcing hard monotonic alignments helps enable robust TTS, which generalizes to long utterances, and employing flows enables fast, diverse, and controllable speech synthesis.

Source: Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Text-To-Speech Synthesis 2 33.33%
Zero-Shot Multi-Speaker TTS 1 16.67%
Speech Synthesis 1 16.67%
Voice Conversion 1 16.67%
Word Alignment 1 16.67%

Components


Component Type
GLOW
Generative Models

Categories