no code implementations • 4 Sep 2023 • Jiaxu Zhu, Weinan Tong, Yaoxun Xu, Changhe Song, Zhiyong Wu, Zhao You, Dan Su, Dong Yu, Helen Meng
Mapping two modalities, speech and text, into a shared representation space, is a research topic of using text-only data to improve end-to-end automatic speech recognition (ASR) performance in new domains.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 7 Apr 2022 • Zhao You, Shulin Feng, Dan Su, Dong Yu
Recently, Conformer based CTC/AED model has become a mainstream architecture for ASR.
Ranked #2 on Speech Recognition on WenetSpeech
no code implementations • 23 Nov 2021 • Zhao You, Shulin Feng, Dan Su, Dong Yu
Mixture-of-experts based acoustic models with dynamic routing mechanisms have proved promising results for speech recognition.
2 code implementations • 13 Jun 2021 • Guoguo Chen, Shuzhou Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Yujun Wang, Zhao You, Zhiyong Yan
This paper introduces GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10, 000 hours of high quality labeled audio suitable for supervised training, and 40, 000 hours of total audio suitable for semi-supervised and unsupervised training.
Ranked #1 on Speech Recognition on GigaSpeech
1 code implementation • 7 May 2021 • Zhao You, Shulin Feng, Dan Su, Dong Yu
Recently, Mixture of Experts (MoE) based Transformer has shown promising results in many domains.
no code implementations • 28 Oct 2019 • Zhao You, Dan Su, Jie Chen, Chao Weng, Dong Yu
Self-attention networks (SAN) have been introduced into automatic speech recognition (ASR) and achieved state-of-the-art performance owing to its superior ability in capturing long term dependency.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 9 Jul 2019 • Zhao You, Dan Su, Dong Yu
First, for each domain, a teacher model (domain-dependent model) is trained by fine-tuning a multi-condition model with domain-specific subset.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1