Voice Conversion
150 papers with code • 2 benchmarks • 5 datasets
Voice Conversion is a technology that modifies the speech of a source speaker and makes their speech sound like that of another target speaker without changing the linguistic information.
Libraries
Use these libraries to find Voice Conversion models and implementationsMost implemented papers
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker TTS.
Epoch-Synchronous Overlap-Add (ESOLA) for Time- and Pitch-Scale Modification of Speech Signals
Time- and pitch-scale modifications of speech signals find important applications in speech synthesis, playback systems, voice conversion, learning/hearing aids, etc..
Scalable Factorized Hierarchical Variational Autoencoder Training
Deep generative models have achieved great success in unsupervised learning with the ability to capture complex nonlinear relationships between latent generating factors and observations.
ACVAE-VC: Non-parallel many-to-many voice conversion with auxiliary classifier variational autoencoder
Such situations can be avoided by introducing an auxiliary classifier and training the encoder and decoder so that the attribute classes of the decoder outputs are correctly predicted by the classifier.
Generative Adversarial Networks for Unpaired Voice Transformation on Impaired Speech
This paper focuses on using voice conversion (VC) to improve the speech intelligibility of surgical patients who have had parts of their articulators removed.
Non-Parallel Voice Conversion with Cyclic Variational Autoencoder
In this work, to overcome this problem, we propose to use CycleVAE-based spectral model that indirectly optimizes the conversion flow by recycling the converted features back into the system to obtain corresponding cyclic reconstructed spectra that can be directly optimized.
MelGAN-VC: Voice Conversion and Audio Style Transfer on arbitrarily long samples using Spectrograms
We propose MelGAN-VC, a voice conversion method that relies on non-parallel speech data and is able to convert audio signals of arbitrary length from a source voice to a target voice.
Mel-spectrogram augmentation for sequence to sequence voice conversion
In addition, we proposed new policies (i. e., frequency warping, loudness and time length control) for more data variations.
Comparison of Speech Representations for Automatic Quality Estimation in Multi-Speaker Text-to-Speech Synthesis
Our NN predicts MOS with a high correlation to human judgments.
Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion without Parallel Data
We propose Cotatron, a transcription-guided speech encoder for speaker-independent linguistic representation.