TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Speech Synthesis	LibriTTS	BigVGAN	PESQ	4.027	# 5
Speech Synthesis	LibriTTS	BigVGAN	MCD	0.3745	# 2
Speech Synthesis	LibriTTS	BigVGAN	Periodicity	0.1018	# 6
Speech Synthesis	LibriTTS	BigVGAN	V/UV F1	0.9598	# 5
Speech Synthesis	LibriTTS	BigVGAN	M-STFT	0.7997	# 4
Speech Synthesis	LibriTTS	BigVGAN-base	PESQ	3.519	# 7
Speech Synthesis	LibriTTS	BigVGAN-base	MCD	0.4564	# 4
Speech Synthesis	LibriTTS	BigVGAN-base	Periodicity	0.1287	# 7
Speech Synthesis	LibriTTS	BigVGAN-base	V/UV F1	0.9459	# 7
Speech Synthesis	LibriTTS	BigVGAN-base	M-STFT	0.8788	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bigvgan-a-universal-neural-vocoder-with-large/speech-synthesis-on-libritts)](https://paperswithcode.com/sota/speech-synthesis-on-libritts?p=bigvgan-a-universal-neural-vocoder-with-large)`

BigVGAN: A Universal Neural Vocoder with Large-Scale Training

9 Jun 2022 · Sang-gil Lee, Wei Ping, Boris Ginsburg, Bryan Catanzaro, Sungroh Yoon ·

Despite recent progress in generative adversarial network (GAN)-based vocoders, where the model generates raw waveform conditioned on acoustic features, it is challenging to synthesize high-fidelity audio for numerous speakers across various recording environments. In this work, we present BigVGAN, a universal vocoder that generalizes well for various out-of-distribution scenarios without fine-tuning. We introduce periodic activation function and anti-aliased representation into the GAN generator, which brings the desired inductive bias for audio synthesis and significantly improves audio quality. In addition, we train our GAN vocoder at the largest scale up to 112M parameters, which is unprecedented in the literature. We identify and address the failure modes in large-scale GAN training for audio, while maintaining high-fidelity output without over-regularization. Our BigVGAN, trained only on clean speech (LibriTTS), achieves the state-of-the-art performance for various zero-shot (out-of-distribution) conditions, including unseen speakers, languages, recording environments, singing voices, music, and instrumental audio. We release our code and model at: https://github.com/NVIDIA/BigVGAN

PDF Abstract

Code

Add Remove Mark official

nvidia/bigvgan official

658

sh-lee-prml/hierspeechpp

↳ Quickstart in

Spaces

1,075

sh-lee-prml/BigVGAN

120

Tasks

Add Remove

Audio Generation

Audio Synthesis

Generative Adversarial Network

Inductive Bias

Music Generation

Speech Synthesis

Datasets

VCTK

LJSpeech LibriTTS

MUSDB18-HQ

Results from the Paper

Edit

Ranked #5 on Speech Synthesis on LibriTTS

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Speech Synthesis	LibriTTS	BigVGAN	PESQ	4.027	# 5	Compare
			MCD	0.3745	# 2	Compare
			Periodicity	0.1018	# 6	Compare
			V/UV F1	0.9598	# 5	Compare
			M-STFT	0.7997	# 4	Compare
Speech Synthesis	LibriTTS	BigVGAN-base	PESQ	3.519	# 7	Compare
			MCD	0.4564	# 4	Compare
			Periodicity	0.1287	# 7	Compare
			V/UV F1	0.9459	# 7	Compare
			M-STFT	0.8788	# 5	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

BigVGAN: A Universal Neural Vocoder with Large-Scale Training

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove