Search Results for author: Qiujia Li

Found 18 papers, 4 papers with code

Handling Ambiguity in Emotion: From Out-of-Domain Detection to Distribution Estimation

no code implementations • 20 Feb 2024 • Wen Wu, Bo Li, Chao Zhang, Chung-Cheng Chiu, Qiujia Li, Junwen Bai, Tara N. Sainath, Philip C. Woodland

The evidential uncertainty measure is extended to quantify the uncertainty in emotion distribution estimation.

Classification Emotion Classification

Paper
Add Code

Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR

no code implementations • 17 Jan 2024 • Junwen Bai, Bo Li, Qiujia Li, Tara N. Sainath, Trevor Strohman

Meanwhile, the heterogeneous nature and imbalanced data abundance of different languages may cause performance degradation, leading to asynchronous peak performance for different languages during training, especially on tail ones.

Paper
Add Code

Massive End-to-end Models for Short Search Queries

no code implementations • 22 Sep 2023 • Weiran Wang, Rohit Prabhavalkar, Dongseong Hwang, Qiujia Li, Khe Chai Sim, Bo Li, James Qin, Xingyu Cai, Adam Stooke, Zhong Meng, CJ Zheng, Yanzhang He, Tara Sainath, Pedro Moreno Mengibar

In this work, we investigate two popular end-to-end automatic speech recognition (ASR) models, namely Connectionist Temporal Classification (CTC) and RNN-Transducer (RNN-T), for offline recognition of voice search queries, with up to 2B model parameters.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Modular Domain Adaptation for Conformer-Based Streaming ASR

no code implementations • 22 May 2023 • Qiujia Li, Bo Li, Dongseong Hwang, Tara N. Sainath, Pedro M. Mengibar

Speech data from different domains has distinct acoustic and linguistic characteristics.

Domain Adaptation speech-recognition +1

Paper
Add Code

Knowledge Distillation from Multiple Foundation Models for End-to-End Speech Recognition

no code implementations • 20 Mar 2023 • Xiaoyu Yang, Qiujia Li, Chao Zhang, Philip C. Woodland

The performance of the student model can be further enhanced when multiple teachers are used jointly, achieving word error rate reductions (WERRs) of 17. 5% and 10. 6%.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Knowledge Distillation for Neural Transducers from Large Self-Supervised Pre-trained Models

no code implementations • 7 Oct 2021 • Xiaoyu Yang, Qiujia Li, Philip C. Woodland

Self-supervised pre-training is an effective approach to leveraging a large amount of unlabelled data to reduce word error rates (WERs) of automatic speech recognition (ASR) systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition

no code implementations • 7 Oct 2021 • Qiujia Li, Yu Zhang, David Qiu, Yanzhang He, Liangliang Cao, Philip C. Woodland

As end-to-end automatic speech recognition (ASR) models reach promising performance, various downstream tasks rely on good confidence estimators for these systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Combining Frame-Synchronous and Label-Synchronous Systems for Speech Recognition

1 code implementation • 1 Jul 2021 • Qiujia Li, Chao Zhang, Philip C. Woodland

Commonly used automatic speech recognition (ASR) systems can be classified into frame-synchronous and label-synchronous categories, based on whether the speech is decoded on a per-frame or per-label basis.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction

no code implementations • 26 Apr 2021 • David Qiu, Yanzhang He, Qiujia Li, Yu Zhang, Liangliang Cao, Ian McGraw

Confidence scores are very useful for downstream applications of automatic speech recognition (ASR) systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Residual Energy-Based Models for End-to-End Speech Recognition

no code implementations • 25 Mar 2021 • Qiujia Li, Yu Zhang, Bo Li, Liangliang Cao, Philip C. Woodland

End-to-end models with auto-regressive decoders have shown impressive results for automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Learning Word-Level Confidence For Subword End-to-End ASR

no code implementations • 11 Mar 2021 • David Qiu, Qiujia Li, Yanzhang He, Yu Zhang, Bo Li, Liangliang Cao, Rohit Prabhavalkar, Deepti Bhatia, Wei Li, Ke Hu, Tara N. Sainath, Ian McGraw

We study the problem of word-level confidence estimation in subword-based end-to-end (E2E) models for automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Confidence Estimation for Attention-based Sequence-to-sequence Models for Speech Recognition

1 code implementation • 22 Oct 2020 • Qiujia Li, David Qiu, Yu Zhang, Bo Li, Yanzhang He, Philip C. Woodland, Liangliang Cao, Trevor Strohman

For various speech-related tasks, confidence scores from a speech recogniser are a useful measure to assess the quality of transcriptions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Discriminative Neural Clustering for Speaker Diarisation

1 code implementation • 22 Oct 2019 • Qiujia Li, Florian L. Kreyssig, Chao Zhang, Philip C. Woodland

In this paper, we propose Discriminative Neural Clustering (DNC) that formulates data clustering with a maximum number of clusters as a supervised sequence-to-sequence learning problem.

Clustering Data Augmentation

Paper
Code

Integrating Source-channel and Attention-based Sequence-to-sequence Models for Speech Recognition

no code implementations • 14 Sep 2019 • Qiujia Li, Chao Zhang, Philip C. Woodland

This paper proposes a novel automatic speech recognition (ASR) framework called Integrated Source-Channel and Attention (ISCA) that combines the advantages of traditional systems based on the noisy source-channel model (SC) and end-to-end style systems using attention-based sequence-to-sequence models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Confidence Estimation and Deletion Prediction Using Bidirectional Recurrent Neural Networks

no code implementations • 30 Oct 2018 • Anton Ragni, Qiujia Li, Mark Gales, Yu Wang

These errors are not accounted for by the standard confidence estimation schemes and are hard to rectify in the upstream and downstream processing.

Paper
Add Code

Bi-Directional Lattice Recurrent Neural Networks for Confidence Estimation

4 code implementations • 30 Oct 2018 • Qiujia Li, Preben Ness, Anton Ragni, Mark Gales

The standard approach to mitigate errors made by an automatic speech recognition system is to use confidence scores associated with each predicted word.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Shape and Material from Sound

no code implementations • NeurIPS 2017 • Zhoutong Zhang, Qiujia Li, Zhengjia Huang, Jiajun Wu, Josh Tenenbaum, Bill Freeman

Hearing an object falling onto the ground, humans can recover rich information including its rough shape, material, and falling height.

Object

Paper
Add Code

Generative Modeling of Audible Shapes for Object Perception

no code implementations • ICCV 2017 • Zhoutong Zhang, Jiajun Wu, Qiujia Li, Zhengjia Huang, James Traer, Josh H. McDermott, Joshua B. Tenenbaum, William T. Freeman

Humans infer rich knowledge of objects from both auditory and visual cues.

Object

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.