Voice Conversion
149 papers with code • 2 benchmarks • 5 datasets
Voice Conversion is a technology that modifies the speech of a source speaker and makes their speech sound like that of another target speaker without changing the linguistic information.
Libraries
Use these libraries to find Voice Conversion models and implementationsLatest papers
Non-Parallel Training Approach for Emotional Voice Conversion Using CycleGAN
The focus of this research is proposing a nonparallel emotional voice conversion for Egyptian Arabic speech.
BiSinger: Bilingual Singing Voice Synthesis
We fuse monolingual singing datasets with open-source singing voice conversion techniques to generate bilingual singing voices while also exploring the potential use of bilingual speech data.
Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel Emotion-Preserving Voice Conversion
Speech anonymisation prevents misuse of spoken data by removing any personal identifier while preserving at least linguistic content.
StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep Embeddings
In this paper, we show that StarGANv2-VC fails to disentangle the speaker and emotion representations, pertinent to preserve emotion.
Evaluating Methods for Ground-Truth-Free Foreign Accent Conversion
In this work, we evaluate three recently proposed methods for ground-truth-free FAC, where all of them aim to harness the power of sequence-to-sequence (seq2seq) and non-parallel VC models to properly convert the accent and control the speaker identity.
FSD: An Initial Chinese Dataset for Fake Song Detection
In this paper, we initially construct a Chinese Fake Song Detection (FSD) dataset to investigate the field of song deepfake detection.
Phoneme Hallucinator: One-shot Voice Conversion via Set Expansion
Objective and subjective evaluations show that \textit{Phoneme Hallucinator} outperforms existing VC methods for both intelligibility and speaker similarity.
Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques
The growing use of voice user interfaces has led to a surge in the collection and storage of speech data.
Rhythm Modeling for Voice Conversion
Voice conversion aims to transform source speech into a different target voice.
Disentanglement in a GAN for Unconditional Speech Synthesis
We confirm that ASGAN's latent space is disentangled: we demonstrate how simple linear operations in the space can be used to perform several tasks unseen during training.