Voice Conversion

149 papers with code • 2 benchmarks • 5 datasets

Voice Conversion is a technology that modifies the speech of a source speaker and makes their speech sound like that of another target speaker without changing the linguistic information.

Source: Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet

Benchmarks

Add a Result

These leaderboards are used to track progress in Voice Conversion

Trend	Dataset	Best Model	Paper	Code	Compare
	ZeroSpeech 2019 English	VQ-CPC			See all
	LibriSpeech test-clean	kNN-VC (prematched HiFiGAN)			See all

Libraries

Use these libraries to find Voice Conversion models and implementations

espnet/espnet

3 papers

7,884

andi611/Self-Supervised-Speech-Pret…

3 papers

2,095

s3prl/s3prl

3 papers

2,094

unilight/seq2seq-vc

3 papers

See all 5 libraries.

Datasets

Latest papers

Most implemented Social Latest No code

Non-Parallel Training Approach for Emotional Voice Conversion Using CycleGAN

MohamedElsayed-22/non-parallel-training-for-emotion-conversion-of-arabic-speech-using-cycleGAN-and-WORLD-Vocoder • • 20th International Conference on Informatics in Control, Automation and Robotics 2023

The focus of this research is proposing a nonparallel emotional voice conversion for Egyptian Arabic speech.

01 Nov 2023

Paper
Code

BiSinger: Bilingual Singing Voice Synthesis

BiSinger-SVS/BiSinger • • 25 Sep 2023

We fuse monolingual singing datasets with open-source singing voice conversion techniques to generate bilingual singing voices while also exploring the potential use of bilingual speech data.

25 Sep 2023

Paper
Code

Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel Emotion-Preserving Voice Conversion

suhitaghosh10/emo-stargan • • 14 Sep 2023

Speech anonymisation prevents misuse of spoken data by removing any personal identifier while preserving at least linguistic content.

14 Sep 2023

Paper
Code

StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep Embeddings

arnabdas8901/StarGAN-VC_PlusPlus • • 14 Sep 2023

In this paper, we show that StarGANv2-VC fails to disentangle the speaker and emotion representations, pertinent to preserve emotion.

14 Sep 2023

Paper
Code

Evaluating Methods for Ground-Truth-Free Foreign Accent Conversion

unilight/seq2seq-vc • • 5 Sep 2023

In this work, we evaluate three recently proposed methods for ground-truth-free FAC, where all of them aim to harness the power of sequence-to-sequence (seq2seq) and non-parallel VC models to properly convert the accent and control the speaker identity.

05 Sep 2023

Paper
Code

FSD: An Initial Chinese Dataset for Fake Song Detection

xieyuankun/fsd-dataset • 5 Sep 2023

In this paper, we initially construct a Chinese Fake Song Detection (FSD) dataset to investigate the field of song deepfake detection.

05 Sep 2023

Paper
Code

Phoneme Hallucinator: One-shot Voice Conversion via Set Expansion

PhonemeHallucinator/Phoneme_Hallucinator • • 11 Aug 2023

Objective and subjective evaluations show that \textit{Phoneme Hallucinator} outperforms existing VC methods for both intelligibility and speaker similarity.

11 Aug 2023

Paper
Code

Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques

voice-privacy-challenge/voice-privacy-challenge-2024 • • 5 Aug 2023

The growing use of voice user interfaces has led to a surge in the collection and storage of speech data.

05 Aug 2023

Paper
Code

Rhythm Modeling for Voice Conversion

bshall/urhythmic • • 12 Jul 2023

Voice conversion aims to transform source speech into a different target voice.

12 Jul 2023

Paper
Code

Disentanglement in a GAN for Unconditional Speech Synthesis

rf5/simple-asgan • • 4 Jul 2023

We confirm that ASGAN's latent space is disentangled: we demonstrate how simple linear operations in the space can be used to perform several tasks unseen during training.

04 Jul 2023

Paper
Code

Voice Conversion

Benchmarks Add a Result

Libraries

Datasets

Latest papers

Content

Benchmarks

Add a Result