Glow-TTS

Introduced by Kim et al. in Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Glow-TTS is a flow-based generative model for parallel TTS that does not require any external aligner. By combining the properties of flows and dynamic programming, the proposed model searches for the most probable monotonic alignment between text and the latent representation of speech. The model is directly trained to maximize the log-likelihood of speech with the alignment. Enforcing hard monotonic alignments helps enable robust TTS, which generalizes to long utterances, and employing flows enables fast, diverse, and controllable speech synthesis.

Source: Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Text-To-Speech Synthesis	2	33.33%
Zero-Shot Multi-Speaker TTS	1	16.67%
Speech Synthesis	1	16.67%
Voice Conversion	1	16.67%
Word Alignment	1	16.67%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
GLOW	Generative Models

Categories

Add Remove

Text-to-Speech Models