Voice Conversion

149 papers with code • 2 benchmarks • 5 datasets

Voice Conversion is a technology that modifies the speech of a source speaker and makes their speech sound like that of another target speaker without changing the linguistic information.

Source: Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet

Benchmarks

Add a Result

These leaderboards are used to track progress in Voice Conversion

Trend	Dataset	Best Model	Paper	Code	Compare
	ZeroSpeech 2019 English	VQ-CPC			See all
	LibriSpeech test-clean	kNN-VC (prematched HiFiGAN)			See all

Libraries

Use these libraries to find Voice Conversion models and implementations

espnet/espnet

3 papers

7,875

s3prl/s3prl

3 papers

2,092

andi611/Self-Supervised-Speech-Pret…

3 papers

2,092

unilight/seq2seq-vc

3 papers

See all 5 libraries.

Datasets

Most implemented papers

Most implemented Social Latest No code

StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks

kamepong/StarGAN-VC • • 6 Jun 2018

This paper proposes a method that allows non-parallel many-to-many voice conversion (VC) by using a variant of a generative adversarial network (GAN) called StarGAN.

Paper
Code

One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization

jjery2243542/adaptive_voice_conversion • • 10 Apr 2019

Recently, voice conversion (VC) without parallel data has been successfully adapted to multi-target scenario in which a single model is trained to convert the input voice to many different speakers.

Paper
Code

AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

liusongxiang/StarGAN-Voice-Conversion • • 14 May 2019

On the other hand, CVAE training is simple but does not come with the distribution-matching property of a GAN.

Paper
Code

Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks

jackaduma/CycleGAN-VC2 • • 30 Nov 2017

A subjective evaluation showed that the quality of the converted speech was comparable to that obtained with a Gaussian mixture model-based method under advantageous conditions with parallel and twice the amount of data.

Paper
Code

CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion

jackaduma/CycleGAN-VC2 • • 9 Apr 2019

Non-parallel voice conversion (VC) is a technique for learning the mapping from source to target speech without relying on parallel data.

Paper
Code

MOSNet: Deep Learning based Objective Assessment for Voice Conversion

lochenchou/MOSNet • • 17 Apr 2019

In this paper, we propose deep learning-based assessment models to predict human ratings of converted speech.

Paper
Code

Unsupervised Speech Decomposition via Triple Information Bottleneck

auspicious3000/SpeechSplit • • ICML 2020

Speech information can be roughly decomposed into four components: language content, timbre, pitch, and rhythm.

Paper
Code

Utilizing Self-supervised Representations for MOS Prediction

s3prl/s3prl • • 7 Apr 2021

In this paper, we use self-supervised pre-trained models for MOS prediction.

Paper
Code

Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised Learning

s3prl/s3prl • • 5 Jun 2020

To explore this issue, we proposed to employ Mockingjay, a self-supervised learning based model, to protect anti-spoofing models against adversarial attacks in the black-box scenario.

Paper
Code

Voice Conversion from Non-parallel Corpora Using Variational Auto-encoder

jackaduma/CycleGAN-VC2 • • 13 Oct 2016

We propose a flexible framework for spectral conversion (SC) that facilitates training with unaligned corpora.

Paper
Code

Voice Conversion

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result