Voice Conversion

149 papers with code • 2 benchmarks • 5 datasets

Voice Conversion is a technology that modifies the speech of a source speaker and makes their speech sound like that of another target speaker without changing the linguistic information.

Source: Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet

Benchmarks

Add a Result

These leaderboards are used to track progress in Voice Conversion

Trend	Dataset	Best Model	Paper	Code	Compare
	ZeroSpeech 2019 English	VQ-CPC			See all
	LibriSpeech test-clean	kNN-VC (prematched HiFiGAN)			See all

Libraries

Use these libraries to find Voice Conversion models and implementations

espnet/espnet

3 papers

7,864

s3prl/s3prl

3 papers

2,084

andi611/Self-Supervised-Speech-Pret…

3 papers

2,084

unilight/seq2seq-vc

3 papers

See all 5 libraries.

Datasets

Latest papers with no code

Most implemented Social Latest No code

PromptCodec: High-Fidelity Neural Speech Codec using Disentangled Representation Learning based Adaptive Feature-aware Prompt Encoders

no code yet • 3 Apr 2024

Neural speech codec has recently gained widespread attention in generative speech modeling domains, like voice conversion, text-to-speech synthesis, etc.

Paper
Add Code

Voice Conversion Augmentation for Speaker Recognition on Defective Datasets

no code yet • 1 Apr 2024

Our experimental results on three created datasets demonstrated that VCA-NN effectively mitigates these dataset problems, which provides a new direction for handling the speaker recognition problems from the data aspect.

Paper
Add Code

PAVITS: Exploring Prosody-aware VITS for End-to-End Emotional Voice Conversion

no code yet • 3 Mar 2024

In this paper, we propose Prosody-aware VITS (PAVITS) for emotional voice conversion (EVC), aiming to achieve two major objectives of EVC: high content naturalness and high emotional naturalness, which are crucial for meeting the demands of human perception.

Paper
Add Code

Transcription and translation of videos using fine-tuned XLSR Wav2Vec2 on custom dataset and mBART

no code yet • 1 Mar 2024

This research addresses the challenge of training an ASR model for personalized voices with minimal data.

Paper
Add Code

Enhancing the Stability of LLM-based Speech Generation Systems through Self-Supervised Representations

no code yet • 5 Feb 2024

Using speaker-disentangled codes to train LLMs for text-to-speech (TTS) allows the LLM to generate the content and the style of the speech only from the text, similarly to humans, while the speaker identity is provided by the decoder of the VC model.

Paper
Add Code

SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition

no code yet • 31 Jan 2024

Existing speech language models typically utilize task-dependent prompt tokens to unify various speech tasks in a single model.

Paper
Add Code

A Proactive and Dual Prevention Mechanism against Illegal Song Covers empowered by Singing Voice Conversion

no code yet • 30 Jan 2024

To improve the imperceptibility of perturbations, we refine a psychoacoustic model-based loss with the backing track as an additional masker, a unique accompanying element for singing voices compared to ordinary speech voices.

Paper
Add Code

Adversarial speech for voice privacy protection from Personalized Speech generation

no code yet • 22 Jan 2024

For validation, we employ the open-source pre-trained YourTTS model for speech generation and protect the target speaker's speech in the white-box scenario.

Paper
Add Code

StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion

no code yet • 19 Jan 2024

Specifically, to enable streaming capability, StreamVoice employs a fully causal context-aware LM with a temporal-independent acoustic predictor, while alternately processing semantic and acoustic features at each time step of autoregression which eliminates the dependence on complete source speech.

Paper
Add Code

Transfer the linguistic representations from TTS to accent conversion with non-parallel data

no code yet • 7 Jan 2024

This paper introduces a novel non-autoregressive framework for accent conversion that learns accent-agnostic linguistic representations and employs them to convert the accent in the source speech.

Paper
Add Code

Voice Conversion

Benchmarks Add a Result

Libraries

Datasets

Latest papers with no code

Content

Benchmarks

Add a Result