Search Results for author: Changhe Song

Found 12 papers, 4 papers with code

3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization

1 code implementation • 29 Mar 2024 • Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Tinglong Zhu, Changhe Song, Rongjie Huang, Ziyang Ma, Qian Chen, Shiliang Zhang, Xihao Li

This paper introduces 3D-Speaker-Toolkit, an open source toolkit for multi-modal speaker verification and diarization.

Self-Supervised Learning speaker-diarization +3

711

Paper
Code

Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation

no code implementations • 4 Sep 2023 • Jiaxu Zhu, Weinan Tong, Yaoxun Xu, Changhe Song, Zhiyong Wu, Zhao You, Dan Su, Dong Yu, Helen Meng

Mapping two modalities, speech and text, into a shared representation space, is a research topic of using text-only data to improve end-to-end automatic speech recognition (ASR) performance in new domains.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge

no code implementations • 4 Sep 2023 • Jiaxu Zhu, Changhe Song, Zhiyong Wu, Helen Meng

Recently, excellent progress has been made in speech recognition.

Domain Generalization speech-recognition +1

Paper
Add Code

Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information

no code implementations • 31 Aug 2023 • Jie Chen, Changhe Song, Deyi Tuo, Xixin Wu, Shiyin Kang, Zhiyong Wu, Helen Meng

For text-to-speech (TTS) synthesis, prosodic structure prediction (PSP) plays an important role in producing natural and intelligible speech.

Multi-Task Learning

Paper
Add Code

Towards Cross-speaker Reading Style Transfer on Audiobook Dataset

no code implementations • 10 Aug 2022 • Xiang Li, Changhe Song, Xianhao Wei, Zhiyong Wu, Jia Jia, Helen Meng

This paper aims to introduce a chunk-wise multi-scale cross-speaker style model to capture both the global genre and the local prosody in audiobook speeches.

Style Transfer

Paper
Add Code

An End-to-end Chinese Text Normalization Model based on Rule-guided Flat-Lattice Transformer

1 code implementation • 31 Mar 2022 • Wenlin Dai, Changhe Song, Xiang Li, Zhiyong Wu, Huashan Pan, Xiulin Li, Helen Meng

Inspired by Flat-LAttice Transformer (FLAT), we propose an end-to-end Chinese text normalization model, which accepts Chinese characters as direct input and integrates expert knowledge contained in rules into the neural network, both contribute to the superior performance of proposed model for the text normalization task.

Paper
Code

A Character-level Span-based Model for Mandarin Prosodic Structure Prediction

1 code implementation • 31 Mar 2022 • Xueyuan Chen, Changhe Song, Yixuan Zhou, Zhiyong Wu, Changbin Chen, Zhongqin Wu, Helen Meng

In this paper, we propose a span-based Mandarin prosodic structure prediction model to obtain an optimal prosodic structure tree, which can be converted to corresponding prosodic label sequence.

Sentence

Paper
Code

Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion

no code implementations • 24 Mar 2022 • Xintao Zhao, Feng Liu, Changhe Song, Zhiyong Wu, Shiyin Kang, Deyi Tuo, Helen Meng

In this paper, we proposed an any-to-one VC method using hybrid bottleneck features extracted from CTC-BNFs and CE-BNFs to complement each other advantages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis

no code implementations • 14 Apr 2021 • Yixuan Zhou, Changhe Song, Jingbei Li, Zhiyong Wu, Yanyao Bian, Dan Su, Helen Meng

Exploiting rich linguistic information in raw text is crucial for expressive text-to-speech (TTS).

Dependency Parsing Representation Learning +3

Paper
Add Code

Towards Multi-Scale Style Control for Expressive Speech Synthesis

no code implementations • 8 Apr 2021 • Xiang Li, Changhe Song, Jingbei Li, Zhiyong Wu, Jia Jia, Helen Meng

This paper introduces a multi-scale speech style modeling method for end-to-end expressive speech synthesis.

Expressive Speech Synthesis Style Transfer

Paper
Add Code

Syntactic representation learning for neural network based TTS with syntactic parse tree traversal

no code implementations • 13 Dec 2020 • Changhe Song, Jingbei Li, Yixuan Zhou, Zhiyong Wu, Helen Meng

Meanwhile, nuclear-norm maximization loss is introduced to enhance the discriminability and diversity of the embeddings of constituent labels.

Representation Learning Sentence

Paper
Add Code

CED: Credible Early Detection of Social Media Rumors

1 code implementation • 10 Nov 2018 • Changhe Song, Cunchao Tu, Cheng Yang, Zhiyuan Liu, Maosong Sun

By regarding all reposts to a rumor candidate as a sequence, the proposed model will seek an early point-in-time for making a credible prediction.

Social and Information Networks

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.