Search Results for author: Xie Chen

Found 44 papers, 9 papers with code

The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge

no code implementations • 9 Apr 2024 • Yiwei Guo, Chenrun Wang, Yifan Yang, Hankun Wang, Ziyang Ma, Chenpeng Du, Shuai Wang, Hanzheng Li, Shuai Fan, HUI ZHANG, Xie Chen, Kai Yu

Discrete speech tokens have been more and more popular in multiple speech processing fields, including automatic speech recognition (ASR), text-to-speech (TTS) and singing voice synthesis (SVS).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Quantum State Generation with Structure-Preserving Diffusion Model

no code implementations • 9 Apr 2024 • Yuchen Zhu, Tianrong Chen, Evangelos A. Theodorou, Xie Chen, Molei Tao

This article considers the generative modeling of the states of quantum systems, and an approach based on denoising diffusion model is proposed.

Denoising

Paper
Add Code

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity

no code implementations • 13 Feb 2024 • Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, JiaMing Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen

We found that delicate designs are not necessary, while an embarrassingly simple composition of off-the-shelf speech encoder, LLM, and the only trainable linear projector is competent for the ASR task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

BAT: Learning to Reason about Spatial Sounds with Large Language Models

no code implementations • 2 Feb 2024 • Zhisheng Zheng, Puyuan Peng, Ziyang Ma, Xie Chen, Eunsol Choi, David Harwath

By integrating Spatial-AST with LLaMA-2 7B model, BAT transcends standard Sound Event Localization and Detection (SELD) tasks, enabling the model to reason about the relationships between the sounds in its environment.

Event Detection Language Modelling +5

Paper
Add Code

VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech

no code implementations • 25 Jan 2024 • Chenpeng Du, Yiwei Guo, Hankun Wang, Yifan Yang, Zhikang Niu, Shuai Wang, HUI ZHANG, Xie Chen, Kai Yu

Recent TTS models with decoder-only Transformer architecture, such as SPEAR-TTS and VALL-E, achieve impressive naturalness and demonstrate the ability for zero-shot adaptation given a speech prompt.

Hallucination

Paper
Add Code

ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering

no code implementations • 14 Jan 2024 • Yakun Song, Zhuo Chen, Xiaofei Wang, Ziyang Ma, Xie Chen

The language model (LM) approach based on acoustic and linguistic prompts, such as VALL-E, has achieved remarkable progress in the field of zero-shot audio generation.

Audio Generation Language Modelling

Paper
Add Code

EAT: Self-Supervised Pre-Training with Efficient Audio Transformer

1 code implementation • 7 Jan 2024 • Wenxi Chen, Yuzhe Liang, Ziyang Ma, Zhisheng Zheng, Xie Chen

Audio self-supervised learning (SSL) pre-training, which aims to learn good representations from unlabeled audio, has made remarkable progress.

Self-Supervised Learning

Paper
Code

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

2 code implementations • 23 Dec 2023 • Ziyang Ma, Zhisheng Zheng, Jiaxin Ye, Jinchao Li, Zhifu Gao, Shiliang Zhang, Xie Chen

To the best of our knowledge, emotion2vec is the first universal representation model in various emotion-related tasks, filling a gap in the field.

Self-Supervised Learning Sentiment Analysis +1

3,141

Paper
Code

SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention

no code implementations • 14 Dec 2023 • Junjie Li, Yiwei Guo, Xie Chen, Kai Yu

Zero-shot voice conversion (VC) aims to transfer the source speaker timbre to arbitrary unseen target speaker timbre, while keeping the linguistic content unchanged.

Position Voice Conversion

Paper
Add Code

Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations

no code implementations • 2 Nov 2023 • Hanglei Zhang, Yiwei Guo, Sen Liu, Xie Chen, Kai Yu

The LLM selects the best-matching style references from annotated utterances based on external style prompts, which can be raw input text or natural language style descriptions.

Language Modelling Large Language Model +1

Paper
Add Code

Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning

1 code implementation • 25 Sep 2023 • Guanrou Yang, Ziyang Ma, Zhisheng Zheng, Yakun Song, Zhikang Niu, Xie Chen

Recent years have witnessed significant advancements in self-supervised learning (SSL) methods for speech-processing tasks.

Representation Learning Self-Supervised Learning +2

Paper
Code

Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition

no code implementations • 19 Sep 2023 • Ziyang Ma, Wen Wu, Zhisheng Zheng, Yiwei Guo, Qian Chen, Shiliang Zhang, Xie Chen

In this paper, we explored how to boost speech emotion recognition (SER) with the state-of-the-art speech pre-trained model (PTM), data2vec, text generation technique, GPT-4, and speech synthesis technique, Azure TTS.

Data Augmentation Language Modelling +5

Paper
Add Code

Improved Factorized Neural Transducer Model For text-only Domain Adaptation

no code implementations • 18 Sep 2023 • Junzhe Liu, Jianwei Yu, Xie Chen

End-to-end models, such as the neural Transducer, have been successful in integrating acoustic and linguistic information jointly to achieve excellent recognition performance.

Domain Adaptation

Paper
Add Code

Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS

1 code implementation • 14 Sep 2023 • Yifan Yang, Feiyu Shen, Chenpeng Du, Ziyang Ma, Kai Yu, Daniel Povey, Xie Chen

Self-supervised learning (SSL) proficiency in speech-related tasks has driven research into utilizing discrete tokens for speech tasks like recognition and translation, which offer lower storage requirements and great potential to employ natural language processing techniques.

Self-Supervised Learning speech-recognition +2

773

Paper
Code

Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer

no code implementations • 14 Sep 2023 • Peng Wang, Yifan Yang, Zheng Liang, Tian Tan, Shiliang Zhang, Xie Chen

In spite of the excellent strides made by end-to-end (E2E) models in speech recognition in recent years, named entity recognition is still challenging but critical for semantic understanding.

Language Modelling named-entity-recognition +3

Paper
Add Code

VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching

no code implementations • 10 Sep 2023 • Yiwei Guo, Chenpeng Du, Ziyang Ma, Xie Chen, Kai Yu

Although diffusion models in text-to-speech have become a popular choice due to their strong generative ability, the intrinsic complexity of sampling from diffusion models harms their efficiency.

Paper
Add Code

Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition

no code implementations • 28 Aug 2023 • Zhisheng Zheng, Ziyang Ma, Yu Wang, Xie Chen

In recent years, speech-based self-supervised learning (SSL) has made significant progress in various tasks, including automatic speech recognition (ASR).

Active Learning Automatic Speech Recognition +3

Paper
Add Code

DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech

no code implementations • 25 Jun 2023 • Sen Liu, Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu

Although high-fidelity speech can be obtained for intralingual speech synthesis, cross-lingual text-to-speech (CTTS) is still far from satisfactory as it is difficult to accurately retain the speaker timbres(i. e. speaker similarity) and eliminate the accents from their first language(i. e. nativeness).

Speech Synthesis

Paper
Add Code

Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems

no code implementations • 23 Jun 2023 • Mingyu Cui, Jiawen Kang, Jiajun Deng, Xi Yin, Yutao Xie, Xie Chen, Xunying Liu

Current ASR systems are mainly trained and evaluated at the utterance level.

speech-recognition Speech Recognition

Paper
Add Code

Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation

1 code implementation • 15 Jun 2023 • Ziyang Ma, Zhisheng Zheng, Guanrou Yang, Yu Wang, Chao Zhang, Xie Chen

Our models outperform other SSL models significantly on the LibriSpeech benchmark without the need for iterative re-clustering and re-training.

Ranked #1 on Automatic Speech Recognition on LibriSpeech test-other

Automatic Speech Recognition Clustering +4

Paper
Code

Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation

no code implementations • 14 Jun 2023 • Zheng Liang, Zheshu Song, Ziyang Ma, Chenpeng Du, Kai Yu, Xie Chen

Recently, end-to-end (E2E) automatic speech recognition (ASR) models have made great strides and exhibit excellent performance in general speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

Blank-regularized CTC for Frame Skipping in Neural Transducer

1 code implementation • 19 May 2023 • Yifan Yang, Xiaoyu Yang, Liyong Guo, Zengwei Yao, Wei Kang, Fangjun Kuang, Long Lin, Xie Chen, Daniel Povey

Neural Transducer and connectionist temporal classification (CTC) are popular end-to-end automatic speech recognition systems.

Automatic Speech Recognition speech-recognition +1

773

Paper
Code

DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder

no code implementations • 30 Mar 2023 • Chenpeng Du, Qi Chen, Xie Chen, Kai Yu

Additionally, we propose a novel method for generating continuous video frames with the DDIM image decoder trained on individual frames, eliminating the need for modelling the joint distribution of consecutive frames directly.

Talking Face Generation

Paper
Add Code

Front-End Adapter: Adapting Front-End Input of Speech based Self-Supervised Learning for Speech Recognition

no code implementations • 18 Feb 2023 • Xie Chen, Ziyang Ma, Changli Tang, Yujin Wang, Zhisheng Zheng

However, the training of SSL models is computationally expensive and a common practice is to fine-tune a released SSL model on the specific task.

Self-Supervised Learning speech-recognition +1

Paper
Add Code

LongFNT: Long-form Speech Recognition with Factorized Neural Transducer

no code implementations • 17 Nov 2022 • Xun Gong, Yu Wu, Jinyu Li, Shujie Liu, Rui Zhao, Xie Chen, Yanmin Qian

This motivates us to leverage the factorized neural transducer structure, containing a real language model, the vocabulary predictor.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance

no code implementations • 17 Nov 2022 • Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu

Specifically, instead of being guided with a one-hot vector for the specified emotion, EmoDiff is guided with a soft label where the value of the specified emotion and \textit{Neutral} is set to $\alpha$ and $1-\alpha$ respectively.

Denoising

Paper
Add Code

MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets

1 code implementation • 14 Nov 2022 • Ziyang Ma, Zhisheng Zheng, Changli Tang, Yujin Wang, Xie Chen

In this paper, we provide a new perspective on self-supervised speech models from how the training targets are obtained.

Ranked #40 on Speech Recognition on LibriSpeech test-other

Automatic Speech Recognition Multi-Task Learning +3

Paper
Code

An Adapter based Multi-label Pre-training for Speech Separation and Enhancement

no code implementations • 11 Nov 2022 • Tianrui Wang, Xie Chen, Zhuo Chen, Shu Yu, Weibin Zhu

In recent years, self-supervised learning (SSL) has achieved tremendous success in various speech tasks due to its power to extract representations from massive unlabeled data.

Denoising Pseudo Label +4

Paper
Add Code

Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition

no code implementations • 27 Oct 2022 • Yujin Wang, Changli Tang, Ziyang Ma, Zhisheng Zheng, Xie Chen, Wei-Qiang Zhang

Recent years have witnessed great strides in self-supervised learning (SSL) on the speech processing.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature

no code implementations • 2 Apr 2022 • Chenpeng Du, Yiwei Guo, Xie Chen, Kai Yu

The mainstream neural text-to-speech(TTS) pipeline is a cascade system, including an acoustic model(AM) that predicts acoustic feature from the input transcript and a vocoder that generates waveform according to the given acoustic feature.

Speech Synthesis Text-To-Speech Synthesis

Paper
Add Code

Low-bit Quantization of Recurrent Neural Network Language Models Using Alternating Direction Methods of Multipliers

no code implementations • 29 Nov 2021 • Junhao Xu, Xie Chen, Shoukang Hu, Jianwei Yu, Xunying Liu, Helen Meng

Index Terms: Language models, Recurrent neural networks, Quantization, Alternating direction methods of multipliers.

Quantization

Paper
Add Code

Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition

no code implementations • 6 Oct 2021 • Zhong Meng, Yashesh Gaur, Naoyuki Kanda, Jinyu Li, Xie Chen, Yu Wu, Yifan Gong

ILMA enables a fast text-only adaptation of the E2E model without increasing the run-time computational cost.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Factorized Neural Transducer for Efficient Language Model Adaptation

1 code implementation • 27 Sep 2021 • Xie Chen, Zhong Meng, Sarangarajan Parthasarathy, Jinyu Li

In recent years, end-to-end (E2E) based automatic speech recognition (ASR) systems have achieved great success due to their simplicity and promising performance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition

no code implementations • 4 Jun 2021 • Zhong Meng, Yu Wu, Naoyuki Kanda, Liang Lu, Xie Chen, Guoli Ye, Eric Sun, Jinyu Li, Yifan Gong

In this work, we perform LM fusion in the minimum WER (MWER) training of an E2E model to obviate the need for LM weights tuning during inference.

Language Modelling speech-recognition +1

Paper
Add Code

Internal Language Model Training for Domain-Adaptive End-to-End Speech Recognition

no code implementations • 2 Feb 2021 • Zhong Meng, Naoyuki Kanda, Yashesh Gaur, Sarangarajan Parthasarathy, Eric Sun, Liang Lu, Xie Chen, Jinyu Li, Yifan Gong

The efficacy of external language model (LM) integration with existing end-to-end (E2E) automatic speech recognition (ASR) systems can be improved significantly using the internal language model estimation (ILME) method.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition

no code implementations • 3 Nov 2020 • Zhong Meng, Sarangarajan Parthasarathy, Eric Sun, Yashesh Gaur, Naoyuki Kanda, Liang Lu, Xie Chen, Rui Zhao, Jinyu Li, Yifan Gong

The external language models (LM) integration remains a challenging task for end-to-end (E2E) automatic speech recognition (ASR) which has no clear division between acoustic and language models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Developing Real-time Streaming Transformer Transducer for Speech Recognition on Large-scale Dataset

no code implementations • 22 Oct 2020 • Xie Chen, Yu Wu, Zhenghao Wang, Shujie Liu, Jinyu Li

Recently, Transformer based end-to-end models have achieved great success in many areas including speech recognition.

speech-recognition Speech Recognition

Paper
Add Code

LSTM-LM with Long-Term History for First-Pass Decoding in Conversational Speech Recognition

no code implementations • 21 Oct 2020 • Xie Chen, Sarangarajan Parthasarathy, William Gale, Shuangyu Chang, Michael Zeng

The context information is captured by the hidden states of LSTM-LMs across utterance and can be used to guide the first-pass search effectively.

speech-recognition Speech Recognition

Paper
Add Code

Memory-Efficient Pipeline-Parallel DNN Training

1 code implementation • 16 Jun 2020 • Deepak Narayanan, Amar Phanishayee, Kaiyu Shi, Xie Chen, Matei Zaharia

Many state-of-the-art ML results have been obtained by scaling up the number of parameters in existing models.

367

Paper
Code

Long-span language modeling for speech recognition

no code implementations • 11 Nov 2019 • Sarangarajan Parthasarathy, William Gale, Xie Chen, George Polovets, Shuangyu Chang

We conduct language modeling and speech recognition experiments on the publicly available LibriSpeech corpus.

Language Modelling Re-Ranking +3

Paper
Add Code

The Effect of Adding Authorship Knowledge in Automated Text Scoring

no code implementations • WS 2018 • Meng Zhang, Xie Chen, Ronan Cummins, {\O}istein E. Andersen, Ted Briscoe

Some language exams have multiple writing tasks.

Paper
Add Code

Neural Network Language Modeling with Letter-based Features and Importance Sampling

no code implementations • ICASSP 2018 • Hainan Xu, Ke Li, Yiming Wang, Jian Wang, Shiyin Kang, Xie Chen, Daniel Povey, Sanjeev Khudanpur

In this paper we describe an extension of the Kaldi software toolkit to support neural-based language modeling, intended for use in automatic speech recognition (ASR) and related tasks.

Ranked #36 on Speech Recognition on LibriSpeech test-other (using extra training data)

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Phonetic and Graphemic Systems for Multi-Genre Broadcast Transcription

no code implementations • 1 Feb 2018 • Yu Wang, Xie Chen, Mark Gales, Anton Ragni, Jeremy Wong

As the combination approaches become more complicated the difference between the phonetic and graphemic systems further decreases.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Future Word Contexts in Neural Network Language Models

no code implementations • 18 Aug 2017 • Xie Chen, Xunying Liu, Anton Ragni, Yu Wang, Mark Gales

Instead of using a recurrent unit to capture the complete future word contexts, a feedforward unit is used to model a finite number of succeeding, future, words.

speech-recognition Speech Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.