Search Results for author: Minglun Han

Found 8 papers, 6 papers with code

VILAS: Exploring the Effects of Vision and Language Context in Automatic Speech Recognition

no code implementations • 31 May 2023 • Ziyi Ni, Minglun Han, Feilong Chen, Linghui Meng, Jing Shi, Pin Lv, Bo Xu

In this paper, we first propose ViLaS (Vision and Language into Automatic Speech Recognition), a novel multimodal ASR model based on the continuous integrate-and-fire (CIF) mechanism, which can integrate visual and textual context simultaneously or separately, to facilitate speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages

2 code implementations • 7 May 2023 • Feilong Chen, Minglun Han, Haozhi Zhao, Qingyang Zhang, Jing Shi, Shuang Xu, Bo Xu

(3) Integrating multiple modalities: all single-modal encoders are aligned with the LLM through X2L interfaces to integrate multimodal capabilities into the LLM.

Attribute Instruction Following +4

901

Paper
Code

Matching-based Term Semantics Pre-training for Spoken Patient Query Understanding

1 code implementation • 2 Mar 2023 • Zefa Hu, Xiuyi Chen, Haoran Wu, Minglun Han, Ziyi Ni, Jing Shi, Shuang Xu, Bo Xu

Medical Slot Filling (MSF) task aims to convert medical queries into structured information, playing an essential role in diagnosis dialogue systems.

slot-filling Slot Filling

2

Paper
Code

Complex Dynamic Neurons Improved Spiking Transformer Network for Efficient Automatic Speech Recognition

1 code implementation • 2 Feb 2023 • Minglun Han, Qingyu Wang, Tielin Zhang, Yi Wang, Duzhen Zhang, Bo Xu

The spiking neural network (SNN) using leaky-integrated-and-fire (LIF) neurons has been commonly used in automatic speech recognition (ASR) tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

62

Paper
Code

Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation

2 code implementations • 30 Jan 2023 • Minglun Han, Feilong Chen, Jing Shi, Shuang Xu, Bo Xu

Large-scale pre-trained language models (PLMs) have shown great potential in natural language processing tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

62

Paper
Code

VLP: A Survey on Vision-Language Pre-training

1 code implementation • 18 Feb 2022 • Feilong Chen, Duzhen Zhang, Minglun Han, Xiuyi Chen, Jing Shi, Shuang Xu, Bo Xu

Finally, we discuss the new frontiers in VLP.

278

Paper
Code

Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection

1 code implementation • 30 Jan 2022 • Minglun Han, Linhao Dong, Zhenlin Liang, Meng Cai, Shiyu Zhou, Zejun Ma, Bo Xu

Nowadays, most methods in end-to-end contextual speech recognition bias the recognition process towards contextual knowledge.

speech-recognition Speech Recognition

23

Paper
Code

CIF-based Collaborative Decoding for End-to-end Contextual Speech Recognition

no code implementations • 17 Dec 2020 • Minglun Han, Linhao Dong, Shiyu Zhou, Bo Xu

End-to-end (E2E) models have achieved promising results on multiple speech recognition benchmarks, and shown the potential to become the mainstream.

Decoder speech-recognition +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.