Search Results for author: Zhihao Du

Found 12 papers, 7 papers with code

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity

no code implementations • 13 Feb 2024 • Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, JiaMing Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen

We found that delicate designs are not necessary, while an embarrassingly simple composition of off-the-shelf speech encoder, LLM, and the only trainable linear projector is competent for the ASR task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT

1 code implementation • 7 Oct 2023 • JiaMing Wang, Zhihao Du, Qian Chen, Yunfei Chu, Zhifu Gao, Zerui Li, Kai Hu, Xiaohuan Zhou, Jin Xu, Ziyang Ma, Wen Wang, Siqi Zheng, Chang Zhou, Zhijie Yan, Shiliang Zhang

In this paper, we propose LauraGPT, a unified GPT model for audio recognition, understanding, and generation.

Audio captioning Automatic Speech Recognition +11

279

Paper
Code

FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec

1 code implementation • 14 Sep 2023 • Zhihao Du, Shiliang Zhang, Kai Hu, Siqi Zheng

We also demonstrate that the pre-trained models are suitable for downstream tasks, including automatic speech recognition and personalized text-to-speech synthesis.

Automatic Speech Recognition speech-recognition +3

279

Paper
Code

CASA-ASR: Context-Aware Speaker-Attributed ASR

no code implementations • 21 May 2023 • Mohan Shi, Zhihao Du, Qian Chen, Fan Yu, Yangze Li, Shiliang Zhang, Jie Zhang, Li-Rong Dai

In addition, a two-pass decoding strategy is further proposed to fully leverage the contextual modeling ability resulting in a better recognition performance.

Automatic Speech Recognition speech-recognition +1

Paper
Add Code

FunASR: A Fundamental End-to-End Speech Recognition Toolkit

1 code implementation • 18 May 2023 • Zhifu Gao, Zerui Li, JiaMing Wang, Haoneng Luo, Xian Shi, Mengzhe Chen, Yabin Li, Lingyun Zuo, Zhihao Du, Zhangyu Xiao, Shiliang Zhang

FunASR offers models trained on large-scale industrial corpora and the ability to deploy them in applications.

Ranked #1 on Speech Recognition on WenetSpeech (using extra training data)

Action Detection Activity Detection +2

3,378

Paper
Code

TOLD: A Novel Two-Stage Overlap-Aware Framework for Speaker Diarization

1 code implementation • 8 Mar 2023 • JiaMing Wang, Zhihao Du, Shiliang Zhang

Recently, end-to-end neural diarization (EEND) is introduced and achieves promising results in speaker-overlapped scenarios.

Ranked #1 on Speaker Diarization on CALLHOME

speaker-diarization Speaker Diarization +1

3,378

Paper
Code

A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings

no code implementations • 1 Nov 2022 • Mohan Shi, Jie Zhang, Zhihao Du, Fan Yu, Qian Chen, Shiliang Zhang, Li-Rong Dai

Speaker-attributed automatic speech recognition (SA-ASR) in multi-party meeting scenarios is one of the most valuable and challenging ASR task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings

no code implementations • 31 Mar 2022 • Fan Yu, Zhihao Du, Shiliang Zhang, Yuxiao Lin, Lei Xie

Therefore, we propose the second approach, WD-SOT, to address alignment errors by introducing a word-level diarization model, which can get rid of such timestamp alignment dependency.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios

1 code implementation • 18 Mar 2022 • Zhihao Du, Shiliang Zhang, Siqi Zheng, Zhijie Yan

Through this formulation, we propose the speaker embedding-aware neural diarization (SEND) framework, where a speech encoder, a speaker encoder, two similarity scorers, and a post-processing network are jointly optimized to predict the encoded labels according to the similarities between speech features and speaker embeddings.

Ranked #1 on Speaker Diarization on AliMeeting

Action Detection Activity Detection +2

3,378

Paper
Code

Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information

2 code implementations • 28 Nov 2021 • Zhihao Du, Shiliang Zhang, Siqi Zheng, Weilong Huang, Ming Lei

In this paper, we reformulate this task as a single-label prediction problem by encoding the multi-speaker labels with power set.

Action Detection Activity Detection +2

3,378

Paper
Code

A Monaural Speech Enhancement Method for Robust Small-Footprint Keyword Spotting

no code implementations • 20 Jun 2019 • Yue Gu, Zhihao Du, HUI ZHANG, Xueliang Zhang

To improve the robustness, a speech enhancement front-end is involved.

Small-Footprint Keyword Spotting Speech Enhancement

Paper
Add Code

Acoustic Scene Classification by Implicitly Identifying Distinct Sound Events

1 code implementation • 10 Apr 2019 • Hongwei Song, Jiqing Han, Shiwen Deng, Zhihao Du

In this paper, we propose a new strategy for acoustic scene classification (ASC) , namely recognizing acoustic scenes through identifying distinct sound events.

Acoustic Scene Classification Classification +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.