Voice Conversion

149 papers with code • 2 benchmarks • 5 datasets

Voice Conversion is a technology that modifies the speech of a source speaker and makes their speech sound like that of another target speaker without changing the linguistic information.

Source: Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet

Libraries

Use these libraries to find Voice Conversion models and implementations
3 papers
7,898
3 papers
2,101
See all 5 libraries.

Latest papers with no code

Transfer the linguistic representations from TTS to accent conversion with non-parallel data

no code yet • 7 Jan 2024

This paper introduces a novel non-autoregressive framework for accent conversion that learns accent-agnostic linguistic representations and employs them to convert the accent in the source speech.

StreamVC: Real-Time Low-Latency Voice Conversion

no code yet • 5 Jan 2024

We present StreamVC, a streaming voice conversion solution that preserves the content and prosody of any source speech while matching the voice timbre from any target speech.

CoMoSVC: Consistency Model-based Singing Voice Conversion

no code yet • 3 Jan 2024

The diffusion-based Singing Voice Conversion (SVC) methods have achieved remarkable performances, producing natural audios with high similarity to the target timbre.

Attention-based Interactive Disentangling Network for Instance-level Emotional Voice Conversion

no code yet • 29 Dec 2023

We introduce a two-stage pipeline to effectively train our network: Stage I utilizes inter-speech contrastive learning to model fine-grained emotion and intra-speech disentanglement learning to better separate emotion and content.

AE-Flow: AutoEncoder Normalizing Flow

no code yet • 27 Dec 2023

The results show that the proposed training paradigm systematically improves speaker similarity and naturalness when compared to regular training methods of normalizing flows.

Exploring data augmentation in bias mitigation against non-native-accented speech

no code yet • 24 Dec 2023

We aim to mitigate the bias against non-native-accented Flemish in a Flemish ASR system.

Creating New Voices using Normalizing Flows

no code yet • 22 Dec 2023

As there is growing interest in synthesizing voices of new speakers, here we investigate the ability of normalizing flows in text-to-speech (TTS) and voice conversion (VC) modes to extrapolate from speakers observed during training to create unseen speaker identities.

SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention

no code yet • 14 Dec 2023

Zero-shot voice conversion (VC) aims to transfer the source speaker timbre to arbitrary unseen target speaker timbre, while keeping the linguistic content unchanged.

PerMod: Perceptually Grounded Voice Modification with Latent Diffusion Models

no code yet • 13 Dec 2023

Perceptual modification of voice is an elusive goal.

Vulnerability of Automatic Identity Recognition to Audio-Visual Deepfakes

no code yet • 29 Nov 2023

From the publicly available speech dataset LibriTTS, we also created a separate database of only audio deepfakes LibriTTS-DF using several latest text to speech methods: YourTTS, Adaspeech, and TorToiSe.