Search Results for author: Ruibo Fu

Found 16 papers, 3 papers with code

Learning Speech Representation From Contrastive Token-Acoustic Pretraining

no code implementations • 1 Sep 2023 • Chunyu Qiang, Hao Li, Yixin Tian, Ruibo Fu, Tao Wang, Longbiao Wang, Jianwu Dang

However, existing contrastive learning methods in the audio field focus on extracting global descriptive information for downstream audio classification tasks, making them unsuitable for TTS, VC, and ASR tasks.

Audio Classification Automatic Speech Recognition +5

Paper
Add Code

Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding

no code implementations • 28 Jul 2023 • Chunyu Qiang, Hao Li, Hao Ni, He Qu, Ruibo Fu, Tao Wang, Longbiao Wang, Jianwu Dang

However, existing methods suffer from three problems: the high dimensionality and waveform distortion of discrete speech representations, the prosodic averaging problem caused by the duration prediction model in non-autoregressive frameworks, and the information redundancy and dimension explosion problems of existing semantic encoding methods.

Language Modelling Speech Synthesis

Paper
Add Code

Low-rank Adaptation Method for Wav2vec2-based Fake Audio Detection

no code implementations • 9 Jun 2023 • Chenglong Wang, Jiangyan Yi, Xiaohui Zhang, JianHua Tao, Le Xu, Ruibo Fu

Self-supervised speech models are a rapidly developing research topic in fake audio detection.

Paper
Add Code

Adaptive Fake Audio Detection with Low-Rank Model Squeezing

no code implementations • 8 Jun 2023 • Xiaohui Zhang, Jiangyan Yi, JianHua Tao, Chenlong Wang, Le Xu, Ruibo Fu

During the inference stage, these adaptation matrices are combined with the existing model to generate the final prediction output.

Paper
Add Code

UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion

no code implementations • 10 Jan 2023 • Haogeng Liu, Tao Wang, Ruibo Fu, Jiangyan Yi, Zhengqi Wen, JianHua Tao

Text-to-speech (TTS) and voice conversion (VC) are two different tasks both aiming at generating high quality speaking voice according to different input modality.

Quantization Voice Conversion

Paper
Add Code

Emotion Selectable End-to-End Text-based Speech Editing

no code implementations • 20 Dec 2022 • Tao Wang, Jiangyan Yi, Ruibo Fu, JianHua Tao, Zhengqi Wen, Chu Yuan Zhang

To achieve this task, we propose Emo-CampNet (emotion CampNet), which can provide the option of emotional attributes for the generated speech in text-based speech editing and has the one-shot ability to edit unseen speakers' speech.

Data Augmentation

Paper
Add Code

SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection

1 code implementation • 11 Nov 2022 • Jiangyan Yi, Chenglong Wang, JianHua Tao, Chu Yuan Zhang, Cunhang Fan, Zhengkun Tian, Haoxin Ma, Ruibo Fu

Some scene fake audio detection benchmark results on the SceneFake dataset are reported in this paper.

Speech Enhancement

Paper
Code

An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era

no code implementations • 6 Oct 2022 • Andreas Triantafyllopoulos, Björn W. Schuller, Gökçe İymen, Metin Sezgin, Xiangheng He, Zijiang Yang, Panagiotis Tzirakis, Shuo Liu, Silvan Mertes, Elisabeth André, Ruibo Fu, JianHua Tao

Speech is the fundamental mode of human communication, and its synthesis has long been a core priority in human-computer interaction research.

Speech Synthesis Text-To-Speech Synthesis

Paper
Add Code

System Fingerprint Recognition for Deepfake Audio: An Initial Dataset and Investigation

no code implementations • 21 Aug 2022 • Xinrui Yan, Jiangyan Yi, Chenglong Wang, JianHua Tao, Junzuo Zhou, Hao Gu, Ruibo Fu

The rapid progress of deep speech synthesis models has posed significant threats to society such as malicious content manipulation.

Face Swapping Speech Synthesis

Paper
Add Code

An Initial Investigation for Detecting Vocoder Fingerprints of Fake Audio

no code implementations • 20 Aug 2022 • Xinrui Yan, Jiangyan Yi, JianHua Tao, Chenglong Wang, Haoxin Ma, Tao Wang, Shiming Wang, Ruibo Fu

Many effective attempts have been made for fake audio detection.

Paper
Add Code

Fully Automated End-to-End Fake Audio Detection

no code implementations • 20 Aug 2022 • Chenglong Wang, Jiangyan Yi, JianHua Tao, Haiyang Sun, Xun Chen, Zhengkun Tian, Haoxin Ma, Cunhang Fan, Ruibo Fu

The existing fake audio detection systems often rely on expert experience to design the acoustic features or manually design the hyperparameters of the network structure.

Paper
Add Code

NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband Excitation for Noise-Controllable Waveform Generation

no code implementations • 5 Mar 2022 • Tao Wang, Ruibo Fu, Jiangyan Yi, JianHua Tao, Zhengqi Wen

We have also verified through experiments that this method can effectively control the noise components in the predicted speech and adjust the SNR of speech.

Paper
Add Code

CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing

1 code implementation • 21 Feb 2022 • Tao Wang, Jiangyan Yi, Ruibo Fu, JianHua Tao, Zhengqi Wen

It can solve unnatural prosody in the edited region and synthesize the speech corresponding to the unseen words in the transcript.

Few-Shot Learning Sentence

164

Paper
Code

ADD 2022: the First Audio Deep Synthesis Detection Challenge

no code implementations • 17 Feb 2022 • Jiangyan Yi, Ruibo Fu, JianHua Tao, Shuai Nie, Haoxin Ma, Chenglong Wang, Tao Wang, Zhengkun Tian, Ye Bai, Cunhang Fan, Shan Liang, Shiming Wang, Shuai Zhang, Xinrui Yan, Le Xu, Zhengqi Wen, Haizhou Li, Zheng Lian, Bin Liu

Audio deepfake detection is an emerging topic, which was included in the ASVspoof 2021.

Audio Generation DeepFake Detection +1

Paper
Add Code

Singing-Tacotron: Global duration control attention and dynamic filter for End-to-end singing voice synthesis

no code implementations • 16 Feb 2022 • Tao Wang, Ruibo Fu, Jiangyan Yi, JianHua Tao, Zhengqi Wen

Firstly, we propose a global duration control attention mechanism for the SVS model.

Singing Voice Synthesis

Paper
Add Code

Half-Truth: A Partially Fake Audio Detection Dataset

1 code implementation • 8 Apr 2021 • Jiangyan Yi, Ye Bai, JianHua Tao, Haoxin Ma, Zhengkun Tian, Chenglong Wang, Tao Wang, Ruibo Fu

Therefore, this paper develops such a dataset for half-truth audio detection (HAD).

Speech Synthesis

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.