Search Results for author: Jinlong Xue

Found 7 papers, 3 papers with code

Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation

1 code implementation • 2 Jan 2024 • Jinlong Xue, Yayue Deng, Yingming Gao, Ya Li

Drawing inspiration from state-of-the-art Text-to-Image (T2I) diffusion models, we introduce Auffusion, a TTA system adapting T2I model frameworks to TTA task, by effectively leveraging their inherent generative strengths and precise cross-modal alignment.

Ranked #5 on Audio Generation on AudioCaps

Audio Generation Style Transfer

117

Paper
Code

Frame-level emotional state alignment method for speech emotion recognition

1 code implementation • 27 Dec 2023 • Qifei Li, Yingming Gao, Cong Wang, Yayue Deng, Jinlong Xue, Yichen Han, Ya Li

To address this problem, we propose a frame-level emotional state alignment method for SER.

Speech Emotion Recognition

Paper
Code

CONCSS: Contrastive-based Context Comprehension for Dialogue-appropriate Prosody in Conversational Speech Synthesis

no code implementations • 16 Dec 2023 • Yayue Deng, Jinlong Xue, Yukang Jia, Qifei Li, Yichen Han, Fengping Wang, Yingming Gao, Dengfeng Ke, Ya Li

In this paper, we introduce a contrastive learning-based CSS framework, CONCSS.

Contrastive Learning Self-Supervised Learning +1

Paper
Add Code

Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis

no code implementations • 5 Jun 2023 • Dengfeng Ke, Yayue Deng, Yukang Jia, Jinlong Xue, Qi Luo, Ya Li, Jianqing Sun, Jiaen Liang, Binghuai Lin

Regressive Text-to-Speech (TTS) system utilizes attention mechanism to generate alignment between text and acoustic feature sequence.

Sentence Speech Synthesis

Paper
Add Code

M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis

no code implementations • 3 May 2023 • Jinlong Xue, Yayue Deng, Fengping Wang, Ya Li, Yingming Gao, JianHua Tao, Jianqing Sun, Jiaen Liang

However, it is still a challenge to comprehensively model the conversation, and a majority of conversational TTS systems only focus on extracting global information and omit local prosody features, which contain important fine-grained information like keywords and emphasis.

Speech Synthesis Text-To-Speech Synthesis

Paper
Add Code

A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis

no code implementations • 7 Oct 2022 • Yichen Han, Ya Li, Yingming Gao, Jinlong Xue, Songpo Wang, Lei Yang

Then we used keypoint decomposition to extract video synthesis controlling parameters from the backend output and the source image.

Paper
Add Code

ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis

1 code implementation • 20 Mar 2022 • Jinlong Xue, Yayue Deng, Yichen Han, Ya Li, Jianqing Sun, Jiaen Liang

In recent years, neural network based methods for multi-speaker text-to-speech synthesis (TTS) have made significant progress.

Speaker Verification Speech Synthesis +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.