Voice Conversion

150 papers with code • 2 benchmarks • 5 datasets

Voice Conversion is a technology that modifies the speech of a source speaker and makes their speech sound like that of another target speaker without changing the linguistic information.

Source: Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet

Benchmarks

Add a Result

These leaderboards are used to track progress in Voice Conversion

Trend	Dataset	Best Model	Paper	Code	Compare
	ZeroSpeech 2019 English	VQ-CPC			See all
	LibriSpeech test-clean	kNN-VC (prematched HiFiGAN)			See all

Libraries

Use these libraries to find Voice Conversion models and implementations

espnet/espnet

3 papers

7,907

s3prl/s3prl

3 papers

2,101

andi611/Self-Supervised-Speech-Pret…

3 papers

2,101

unilight/seq2seq-vc

3 papers

See all 5 libraries.

Datasets

Most implemented papers

Most implemented Social Latest No code

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

coqui-ai/TTS • • 4 Dec 2021

YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker TTS.

Paper
Code

Epoch-Synchronous Overlap-Add (ESOLA) for Time- and Pitch-Scale Modification of Speech Signals

vinusankars/ESOLA • 19 Jan 2018

Time- and pitch-scale modifications of speech signals find important applications in speech synthesis, playback systems, voice conversion, learning/hearing aids, etc..

Paper
Code

Scalable Factorized Hierarchical Variational Autoencoder Training

wnhsu/ScalableFHVAE • • 9 Apr 2018

Deep generative models have achieved great success in unsupervised learning with the ability to capture complex nonlinear relationships between latent generating factors and observations.

Paper
Code

ACVAE-VC: Non-parallel many-to-many voice conversion with auxiliary classifier variational autoencoder

aoixcat/ACVAE-VC • • 13 Aug 2018

Such situations can be avoided by introducing an auxiliary classifier and training the encoder and decoder so that the attribute classes of the decoder outputs are correctly predicted by the classifier.

Paper
Code

Generative Adversarial Networks for Unpaired Voice Transformation on Impaired Speech

b04901014/ISGAN • • 30 Oct 2018

This paper focuses on using voice conversion (VC) to improve the speech intelligibility of surgical patients who have had parts of their articulators removed.

Paper
Code

Non-Parallel Voice Conversion with Cyclic Variational Autoencoder

patrickltobing/cyclevae-vc • • 24 Jul 2019

In this work, to overcome this problem, we propose to use CycleVAE-based spectral model that indirectly optimizes the conversion flow by recycling the converted features back into the system to obtain corresponding cyclic reconstructed spectra that can be directly optimized.

Paper
Code

MelGAN-VC: Voice Conversion and Audio Style Transfer on arbitrarily long samples using Spectrograms

marcoppasini/MelGAN-VC • • 8 Oct 2019

We propose MelGAN-VC, a voice conversion method that relies on non-parallel speech data and is able to convert audio signals of arbitrary length from a source voice to a target voice.

Paper
Code