Search Results for author: Minglun Han

Found 8 papers, 6 papers with code

VILAS: Exploring the Effects of Vision and Language Context in Automatic Speech Recognition

no code implementations31 May 2023 Ziyi Ni, Minglun Han, Feilong Chen, Linghui Meng, Jing Shi, Pin Lv, Bo Xu

In this paper, we first propose ViLaS (Vision and Language into Automatic Speech Recognition), a novel multimodal ASR model based on the continuous integrate-and-fire (CIF) mechanism, which can integrate visual and textual context simultaneously or separately, to facilitate speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages

2 code implementations7 May 2023 Feilong Chen, Minglun Han, Haozhi Zhao, Qingyang Zhang, Jing Shi, Shuang Xu, Bo Xu

(3) Integrating multiple modalities: all single-modal encoders are aligned with the LLM through X2L interfaces to integrate multimodal capabilities into the LLM.

Attribute Instruction Following +4

Matching-based Term Semantics Pre-training for Spoken Patient Query Understanding

1 code implementation2 Mar 2023 Zefa Hu, Xiuyi Chen, Haoran Wu, Minglun Han, Ziyi Ni, Jing Shi, Shuang Xu, Bo Xu

Medical Slot Filling (MSF) task aims to convert medical queries into structured information, playing an essential role in diagnosis dialogue systems.

slot-filling Slot Filling

Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection

1 code implementation30 Jan 2022 Minglun Han, Linhao Dong, Zhenlin Liang, Meng Cai, Shiyu Zhou, Zejun Ma, Bo Xu

Nowadays, most methods in end-to-end contextual speech recognition bias the recognition process towards contextual knowledge.

speech-recognition Speech Recognition

CIF-based Collaborative Decoding for End-to-end Contextual Speech Recognition

no code implementations17 Dec 2020 Minglun Han, Linhao Dong, Shiyu Zhou, Bo Xu

End-to-end (E2E) models have achieved promising results on multiple speech recognition benchmarks, and shown the potential to become the mainstream.

speech-recognition Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.