no code implementations • 20 Feb 2024 • Wen Wu, Bo Li, Chao Zhang, Chung-Cheng Chiu, Qiujia Li, Junwen Bai, Tara N. Sainath, Philip C. Woodland
The evidential uncertainty measure is extended to quantify the uncertainty in emotion distribution estimation.
no code implementations • 17 Jan 2024 • Junwen Bai, Bo Li, Qiujia Li, Tara N. Sainath, Trevor Strohman
Meanwhile, the heterogeneous nature and imbalanced data abundance of different languages may cause performance degradation, leading to asynchronous peak performance for different languages during training, especially on tail ones.
no code implementations • 22 Sep 2023 • Weiran Wang, Rohit Prabhavalkar, Dongseong Hwang, Qiujia Li, Khe Chai Sim, Bo Li, James Qin, Xingyu Cai, Adam Stooke, Zhong Meng, CJ Zheng, Yanzhang He, Tara Sainath, Pedro Moreno Mengibar
In this work, we investigate two popular end-to-end automatic speech recognition (ASR) models, namely Connectionist Temporal Classification (CTC) and RNN-Transducer (RNN-T), for offline recognition of voice search queries, with up to 2B model parameters.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 22 May 2023 • Qiujia Li, Bo Li, Dongseong Hwang, Tara N. Sainath, Pedro M. Mengibar
Speech data from different domains has distinct acoustic and linguistic characteristics.
no code implementations • 20 Mar 2023 • Xiaoyu Yang, Qiujia Li, Chao Zhang, Philip C. Woodland
The performance of the student model can be further enhanced when multiple teachers are used jointly, achieving word error rate reductions (WERRs) of 17. 5% and 10. 6%.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 7 Oct 2021 • Xiaoyu Yang, Qiujia Li, Philip C. Woodland
Self-supervised pre-training is an effective approach to leveraging a large amount of unlabelled data to reduce word error rates (WERs) of automatic speech recognition (ASR) systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 7 Oct 2021 • Qiujia Li, Yu Zhang, David Qiu, Yanzhang He, Liangliang Cao, Philip C. Woodland
As end-to-end automatic speech recognition (ASR) models reach promising performance, various downstream tasks rely on good confidence estimators for these systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 1 Jul 2021 • Qiujia Li, Chao Zhang, Philip C. Woodland
Commonly used automatic speech recognition (ASR) systems can be classified into frame-synchronous and label-synchronous categories, based on whether the speech is decoded on a per-frame or per-label basis.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 26 Apr 2021 • David Qiu, Yanzhang He, Qiujia Li, Yu Zhang, Liangliang Cao, Ian McGraw
Confidence scores are very useful for downstream applications of automatic speech recognition (ASR) systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 25 Mar 2021 • Qiujia Li, Yu Zhang, Bo Li, Liangliang Cao, Philip C. Woodland
End-to-end models with auto-regressive decoders have shown impressive results for automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 11 Mar 2021 • David Qiu, Qiujia Li, Yanzhang He, Yu Zhang, Bo Li, Liangliang Cao, Rohit Prabhavalkar, Deepti Bhatia, Wei Li, Ke Hu, Tara N. Sainath, Ian McGraw
We study the problem of word-level confidence estimation in subword-based end-to-end (E2E) models for automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 22 Oct 2020 • Qiujia Li, David Qiu, Yu Zhang, Bo Li, Yanzhang He, Philip C. Woodland, Liangliang Cao, Trevor Strohman
For various speech-related tasks, confidence scores from a speech recogniser are a useful measure to assess the quality of transcriptions.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 22 Oct 2019 • Qiujia Li, Florian L. Kreyssig, Chao Zhang, Philip C. Woodland
In this paper, we propose Discriminative Neural Clustering (DNC) that formulates data clustering with a maximum number of clusters as a supervised sequence-to-sequence learning problem.
no code implementations • 14 Sep 2019 • Qiujia Li, Chao Zhang, Philip C. Woodland
This paper proposes a novel automatic speech recognition (ASR) framework called Integrated Source-Channel and Attention (ISCA) that combines the advantages of traditional systems based on the noisy source-channel model (SC) and end-to-end style systems using attention-based sequence-to-sequence models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 30 Oct 2018 • Anton Ragni, Qiujia Li, Mark Gales, Yu Wang
These errors are not accounted for by the standard confidence estimation schemes and are hard to rectify in the upstream and downstream processing.
4 code implementations • 30 Oct 2018 • Qiujia Li, Preben Ness, Anton Ragni, Mark Gales
The standard approach to mitigate errors made by an automatic speech recognition system is to use confidence scores associated with each predicted word.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • NeurIPS 2017 • Zhoutong Zhang, Qiujia Li, Zhengjia Huang, Jiajun Wu, Josh Tenenbaum, Bill Freeman
Hearing an object falling onto the ground, humans can recover rich information including its rough shape, material, and falling height.
no code implementations • ICCV 2017 • Zhoutong Zhang, Jiajun Wu, Qiujia Li, Zhengjia Huang, James Traer, Josh H. McDermott, Joshua B. Tenenbaum, William T. Freeman
Humans infer rich knowledge of objects from both auditory and visual cues.