Voice Conversion

149 papers with code • 2 benchmarks • 5 datasets

Voice Conversion is a technology that modifies the speech of a source speaker and makes their speech sound like that of another target speaker without changing the linguistic information.

Source: Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet

Libraries

Use these libraries to find Voice Conversion models and implementations
3 papers
7,835
3 papers
2,081
See all 5 libraries.

High-Fidelity Neural Phonetic Posteriorgrams

interactiveaudiolab/ppgs 27 Feb 2024

A phonetic posteriorgram (PPG) is a time-varying categorical distribution over acoustic units of speech (e. g., phonemes).

40
27 Feb 2024

SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation

0nutation/speechgpt 24 Jan 2024

It comprises an autoregressive model based on LLM for semantic information modeling and a non-autoregressive model employing flow matching for perceptual information modeling.

889
24 Jan 2024

DurFlex-EVC: Duration-Flexible Emotional Voice Conversion with Parallel Generation

hs-oh-prml/durflexevc 16 Jan 2024

Emotional voice conversion (EVC) seeks to modify the emotional tone of a speaker's voice while preserving the original linguistic content and the speaker's unique vocal characteristics.

34
16 Jan 2024

AutoVisual Fusion Suite: A Comprehensive Evaluation of Image Segmentation and Voice Conversion Tools on HuggingFace Platform

amirrezahmi/video-inpainting-and-voice-cloning 17 Dec 2023

This study presents a comprehensive evaluation of tools available on the HuggingFace platform for two pivotal applications in artificial intelligence: image segmentation and voice conversion.

21
17 Dec 2023

What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection

cecile-hi/regularized-adaptive-weight-modification 15 Dec 2023

The rapid evolution of speech synthesis and voice conversion has raised substantial concerns due to the potential misuse of such technology, prompting a pressing need for effective audio deepfake detection mechanisms.

14
15 Dec 2023

HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis

sh-lee-prml/hierspeechpp 21 Nov 2023

Furthermore, we significantly improve the naturalness and speaker similarity of synthetic speech even in zero-shot speech synthesis scenarios.

1,061
21 Nov 2023

Improving fairness for spoken language understanding in atypical speech with Text-to-Speech

wanghelin1997/aty-tts 16 Nov 2023

Spoken language understanding (SLU) systems often exhibit suboptimal performance in processing atypical speech, typically caused by neurological conditions and motor impairments.

7
16 Nov 2023

CSLP-AE: A Contrastive Split-Latent Permutation Autoencoder Framework for Zero-Shot Electroencephalography Signal Conversion

andersxa/cslp-ae NeurIPS 2023

While the present work only considers conversion of EEG, the proposed CSLP-AE provides a general framework for signal conversion and extraction of content (task activation) and style (subject variability) components of general interest for the modeling and analysis of biological signals.

6
13 Nov 2023

Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation

hayeong0/Diff-HierVC 8 Nov 2023

Finally, by using the masked prior in diffusion models, our model can improve the speaker adaptation quality.

136
08 Nov 2023

Low-latency Real-time Voice Conversion on CPU

koeai/llvc 1 Nov 2023

To our knowledge LLVC achieves both the lowest resource usage as well as the lowest latency of any open-source voice conversion model.

343
01 Nov 2023