Voice Conversion

149 papers with code • 2 benchmarks • 5 datasets

Voice Conversion is a technology that modifies the speech of a source speaker and makes their speech sound like that of another target speaker without changing the linguistic information.

Source: Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet

Benchmarks

Add a Result

These leaderboards are used to track progress in Voice Conversion

Trend	Dataset	Best Model	Paper	Code	Compare
	ZeroSpeech 2019 English	VQ-CPC			See all
	LibriSpeech test-clean	kNN-VC (prematched HiFiGAN)			See all

Libraries

Use these libraries to find Voice Conversion models and implementations

espnet/espnet

3 papers

7,898

s3prl/s3prl

3 papers

2,101

andi611/Self-Supervised-Speech-Pret…

3 papers

2,100

unilight/seq2seq-vc

3 papers

See all 5 libraries.

Datasets

Latest papers with no code

Most implemented Social Latest No code

Transfer the linguistic representations from TTS to accent conversion with non-parallel data

no code yet • 7 Jan 2024

This paper introduces a novel non-autoregressive framework for accent conversion that learns accent-agnostic linguistic representations and employs them to convert the accent in the source speech.

Paper
Add Code

StreamVC: Real-Time Low-Latency Voice Conversion

no code yet • 5 Jan 2024

We present StreamVC, a streaming voice conversion solution that preserves the content and prosody of any source speech while matching the voice timbre from any target speech.

Paper
Add Code

CoMoSVC: Consistency Model-based Singing Voice Conversion

no code yet • 3 Jan 2024

The diffusion-based Singing Voice Conversion (SVC) methods have achieved remarkable performances, producing natural audios with high similarity to the target timbre.

Paper
Add Code

Attention-based Interactive Disentangling Network for Instance-level Emotional Voice Conversion

no code yet • 29 Dec 2023

We introduce a two-stage pipeline to effectively train our network: Stage I utilizes inter-speech contrastive learning to model fine-grained emotion and intra-speech disentanglement learning to better separate emotion and content.

Paper
Add Code

AE-Flow: AutoEncoder Normalizing Flow

no code yet • 27 Dec 2023

The results show that the proposed training paradigm systematically improves speaker similarity and naturalness when compared to regular training methods of normalizing flows.

Paper
Add Code

Exploring data augmentation in bias mitigation against non-native-accented speech

no code yet • 24 Dec 2023

We aim to mitigate the bias against non-native-accented Flemish in a Flemish ASR system.

Paper
Add Code

Creating New Voices using Normalizing Flows

no code yet • 22 Dec 2023

As there is growing interest in synthesizing voices of new speakers, here we investigate the ability of normalizing flows in text-to-speech (TTS) and voice conversion (VC) modes to extrapolate from speakers observed during training to create unseen speaker identities.

Paper
Add Code

SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention

no code yet • 14 Dec 2023

Zero-shot voice conversion (VC) aims to transfer the source speaker timbre to arbitrary unseen target speaker timbre, while keeping the linguistic content unchanged.

Paper
Add Code

PerMod: Perceptually Grounded Voice Modification with Latent Diffusion Models

no code yet • 13 Dec 2023

Perceptual modification of voice is an elusive goal.

Paper
Add Code

Vulnerability of Automatic Identity Recognition to Audio-Visual Deepfakes

no code yet • 29 Nov 2023

From the publicly available speech dataset LibriTTS, we also created a separate database of only audio deepfakes LibriTTS-DF using several latest text to speech methods: YourTTS, Adaspeech, and TorToiSe.

Paper
Add Code

Voice Conversion

Benchmarks Add a Result

Libraries

Datasets

Latest papers with no code

Content

Benchmarks

Add a Result