Search Results for author: Changli Tang

Found 6 papers, 3 papers with code

SALMONN: Towards Generic Hearing Abilities for Large Language Models

1 code implementation • 20 Oct 2023 • Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang

Hearing is arguably an essential ability of artificial intelligence (AI) agents in the physical world, which refers to the perception and understanding of general auditory information consisting of at least three types of sounds: speech, audio events, and music.

Audio captioning Automatic Speech Recognition +10

798

Paper
Code

Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models

2 code implementations • 9 Oct 2023 • Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang

Audio-visual large language models (LLM) have drawn significant attention, yet the fine-grained combination of both input streams is rather under-explored, which is challenging but necessary for LLMs to understand general video inputs.

Question Answering Video Question Answering

Paper
Code

Connecting Speech Encoder and Large Language Model for ASR

no code implementations • 25 Sep 2023 • Wenyi Yu, Changli Tang, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang

Q-Former-based LLMs can generalise well to out-of-domain datasets, where 12% relative WER reductions over the Whisper baseline ASR model were achieved on the Eval2000 test set without using any in-domain training data from Switchboard.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Front-End Adapter: Adapting Front-End Input of Speech based Self-Supervised Learning for Speech Recognition

no code implementations • 18 Feb 2023 • Xie Chen, Ziyang Ma, Changli Tang, Yujin Wang, Zhisheng Zheng

However, the training of SSL models is computationally expensive and a common practice is to fine-tune a released SSL model on the specific task.

Self-Supervised Learning speech-recognition +1

Paper
Add Code

MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets

1 code implementation • 14 Nov 2022 • Ziyang Ma, Zhisheng Zheng, Changli Tang, Yujin Wang, Xie Chen

In this paper, we provide a new perspective on self-supervised speech models from how the training targets are obtained.

Ranked #40 on Speech Recognition on LibriSpeech test-other

Automatic Speech Recognition Multi-Task Learning +3

Paper
Code

Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition

no code implementations • 27 Oct 2022 • Yujin Wang, Changli Tang, Ziyang Ma, Zhisheng Zheng, Xie Chen, Wei-Qiang Zhang

Recent years have witnessed great strides in self-supervised learning (SSL) on the speech processing.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.