Voice Conversion

149 papers with code • 2 benchmarks • 5 datasets

Voice Conversion is a technology that modifies the speech of a source speaker and makes their speech sound like that of another target speaker without changing the linguistic information.

Source: Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet

Libraries

Use these libraries to find Voice Conversion models and implementations
3 papers
7,864
3 papers
2,084
See all 5 libraries.

Latest papers with no code

PromptCodec: High-Fidelity Neural Speech Codec using Disentangled Representation Learning based Adaptive Feature-aware Prompt Encoders

no code yet • 3 Apr 2024

Neural speech codec has recently gained widespread attention in generative speech modeling domains, like voice conversion, text-to-speech synthesis, etc.

Voice Conversion Augmentation for Speaker Recognition on Defective Datasets

no code yet • 1 Apr 2024

Our experimental results on three created datasets demonstrated that VCA-NN effectively mitigates these dataset problems, which provides a new direction for handling the speaker recognition problems from the data aspect.

PAVITS: Exploring Prosody-aware VITS for End-to-End Emotional Voice Conversion

no code yet • 3 Mar 2024

In this paper, we propose Prosody-aware VITS (PAVITS) for emotional voice conversion (EVC), aiming to achieve two major objectives of EVC: high content naturalness and high emotional naturalness, which are crucial for meeting the demands of human perception.

Transcription and translation of videos using fine-tuned XLSR Wav2Vec2 on custom dataset and mBART

no code yet • 1 Mar 2024

This research addresses the challenge of training an ASR model for personalized voices with minimal data.

Enhancing the Stability of LLM-based Speech Generation Systems through Self-Supervised Representations

no code yet • 5 Feb 2024

Using speaker-disentangled codes to train LLMs for text-to-speech (TTS) allows the LLM to generate the content and the style of the speech only from the text, similarly to humans, while the speaker identity is provided by the decoder of the VC model.

SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition

no code yet • 31 Jan 2024

Existing speech language models typically utilize task-dependent prompt tokens to unify various speech tasks in a single model.

A Proactive and Dual Prevention Mechanism against Illegal Song Covers empowered by Singing Voice Conversion

no code yet • 30 Jan 2024

To improve the imperceptibility of perturbations, we refine a psychoacoustic model-based loss with the backing track as an additional masker, a unique accompanying element for singing voices compared to ordinary speech voices.

Adversarial speech for voice privacy protection from Personalized Speech generation

no code yet • 22 Jan 2024

For validation, we employ the open-source pre-trained YourTTS model for speech generation and protect the target speaker's speech in the white-box scenario.

StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion

no code yet • 19 Jan 2024

Specifically, to enable streaming capability, StreamVoice employs a fully causal context-aware LM with a temporal-independent acoustic predictor, while alternately processing semantic and acoustic features at each time step of autoregression which eliminates the dependence on complete source speech.

Transfer the linguistic representations from TTS to accent conversion with non-parallel data

no code yet • 7 Jan 2024

This paper introduces a novel non-autoregressive framework for accent conversion that learns accent-agnostic linguistic representations and employs them to convert the accent in the source speech.