no code implementations • 2 Feb 2024 • Zhisheng Zheng, Puyuan Peng, Ziyang Ma, Xie Chen, Eunsol Choi, David Harwath
By integrating Spatial-AST with LLaMA-2 7B model, BAT transcends standard Sound Event Localization and Detection (SELD) tasks, enabling the model to reason about the relationships between the sounds in its environment.
1 code implementation • 7 Jan 2024 • Wenxi Chen, Yuzhe Liang, Ziyang Ma, Zhisheng Zheng, Xie Chen
Audio self-supervised learning (SSL) pre-training, which aims to learn good representations from unlabeled audio, has made remarkable progress.
2 code implementations • 23 Dec 2023 • Ziyang Ma, Zhisheng Zheng, Jiaxin Ye, Jinchao Li, Zhifu Gao, Shiliang Zhang, Xie Chen
To the best of our knowledge, emotion2vec is the first universal representation model in various emotion-related tasks, filling a gap in the field.
1 code implementation • 25 Sep 2023 • Guanrou Yang, Ziyang Ma, Zhisheng Zheng, Yakun Song, Zhikang Niu, Xie Chen
Recent years have witnessed significant advancements in self-supervised learning (SSL) methods for speech-processing tasks.
no code implementations • 19 Sep 2023 • Ziyang Ma, Wen Wu, Zhisheng Zheng, Yiwei Guo, Qian Chen, Shiliang Zhang, Xie Chen
In this paper, we explored how to boost speech emotion recognition (SER) with the state-of-the-art speech pre-trained model (PTM), data2vec, text generation technique, GPT-4, and speech synthesis technique, Azure TTS.
no code implementations • 28 Aug 2023 • Zhisheng Zheng, Ziyang Ma, Yu Wang, Xie Chen
In recent years, speech-based self-supervised learning (SSL) has made significant progress in various tasks, including automatic speech recognition (ASR).
1 code implementation • 15 Jun 2023 • Ziyang Ma, Zhisheng Zheng, Guanrou Yang, Yu Wang, Chao Zhang, Xie Chen
Our models outperform other SSL models significantly on the LibriSpeech benchmark without the need for iterative re-clustering and re-training.
no code implementations • 18 Feb 2023 • Xie Chen, Ziyang Ma, Changli Tang, Yujin Wang, Zhisheng Zheng
However, the training of SSL models is computationally expensive and a common practice is to fine-tune a released SSL model on the specific task.
1 code implementation • 14 Nov 2022 • Ziyang Ma, Zhisheng Zheng, Changli Tang, Yujin Wang, Xie Chen
In this paper, we provide a new perspective on self-supervised speech models from how the training targets are obtained.
Ranked #40 on Speech Recognition on LibriSpeech test-other
no code implementations • 27 Oct 2022 • Yujin Wang, Changli Tang, Ziyang Ma, Zhisheng Zheng, Xie Chen, Wei-Qiang Zhang
Recent years have witnessed great strides in self-supervised learning (SSL) on the speech processing.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2