Voice Conversion

150 papers with code • 2 benchmarks • 5 datasets

Voice Conversion is a technology that modifies the speech of a source speaker and makes their speech sound like that of another target speaker without changing the linguistic information.

Source: Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet

Libraries

Use these libraries to find Voice Conversion models and implementations
3 papers
7,907
3 papers
2,101
See all 5 libraries.

Most implemented papers

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

coqui-ai/TTS 4 Dec 2021

YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker TTS.

Epoch-Synchronous Overlap-Add (ESOLA) for Time- and Pitch-Scale Modification of Speech Signals

vinusankars/ESOLA 19 Jan 2018

Time- and pitch-scale modifications of speech signals find important applications in speech synthesis, playback systems, voice conversion, learning/hearing aids, etc..

Scalable Factorized Hierarchical Variational Autoencoder Training

wnhsu/ScalableFHVAE 9 Apr 2018

Deep generative models have achieved great success in unsupervised learning with the ability to capture complex nonlinear relationships between latent generating factors and observations.

ACVAE-VC: Non-parallel many-to-many voice conversion with auxiliary classifier variational autoencoder

aoixcat/ACVAE-VC 13 Aug 2018

Such situations can be avoided by introducing an auxiliary classifier and training the encoder and decoder so that the attribute classes of the decoder outputs are correctly predicted by the classifier.

Generative Adversarial Networks for Unpaired Voice Transformation on Impaired Speech

b04901014/ISGAN 30 Oct 2018

This paper focuses on using voice conversion (VC) to improve the speech intelligibility of surgical patients who have had parts of their articulators removed.

Non-Parallel Voice Conversion with Cyclic Variational Autoencoder

patrickltobing/cyclevae-vc 24 Jul 2019

In this work, to overcome this problem, we propose to use CycleVAE-based spectral model that indirectly optimizes the conversion flow by recycling the converted features back into the system to obtain corresponding cyclic reconstructed spectra that can be directly optimized.

MelGAN-VC: Voice Conversion and Audio Style Transfer on arbitrarily long samples using Spectrograms

marcoppasini/MelGAN-VC 8 Oct 2019

We propose MelGAN-VC, a voice conversion method that relies on non-parallel speech data and is able to convert audio signals of arbitrary length from a source voice to a target voice.

Mel-spectrogram augmentation for sequence to sequence voice conversion

makcedward/nlpaug 6 Jan 2020

In addition, we proposed new policies (i. e., frequency warping, loudness and time length control) for more data variations.

Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion without Parallel Data

mindslab-ai/cotatron 7 May 2020

We propose Cotatron, a transcription-guided speech encoder for speaker-independent linguistic representation.