TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Speech Synthesis	LibriTTS	WaveFlow	PESQ	3.027	# 9
Speech Synthesis	LibriTTS	WaveFlow	MCD	1.2455	# 6
Speech Synthesis	LibriTTS	WaveFlow	Periodicity	0.1416	# 8
Speech Synthesis	LibriTTS	WaveFlow	V/UV F1	0.9410	# 8
Speech Synthesis	LibriTTS	WaveFlow	M-STFT	1.1120	# 8

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/waveflow-a-compact-flow-based-model-for-raw-1/speech-synthesis-on-libritts)](https://paperswithcode.com/sota/speech-synthesis-on-libritts?p=waveflow-a-compact-flow-based-model-for-raw-1)`

WaveFlow: A Compact Flow-based Model for Raw Audio

ICML 2020 · Wei Ping, Kainan Peng, Kexin Zhao, Zhao Song ·

In this work, we propose WaveFlow, a small-footprint generative flow for raw audio, which is directly trained with maximum likelihood. It handles the long-range structure of 1-D waveform with a dilated 2-D convolutional architecture, while modeling the local variations using expressive autoregressive functions. WaveFlow provides a unified view of likelihood-based models for 1-D data, including WaveNet and WaveGlow as special cases. It generates high-fidelity speech as WaveNet, while synthesizing several orders of magnitude faster as it only requires a few sequential steps to generate very long waveforms with hundreds of thousands of time-steps. Furthermore, it can significantly reduce the likelihood gap that has existed between autoregressive models and flow-based models for efficient synthesis. Finally, our small-footprint WaveFlow has only 5.91M parameters, which is 15$\times$ smaller than WaveGlow. It can generate 22.05 kHz high-fidelity audio 42.6$\times$ faster than real-time (at a rate of 939.3 kHz) on a V100 GPU without engineered inference kernels.

PDF Abstract ICML 2020 PDF