no code implementations • 20 Feb 2024 • Wen Wu, Bo Li, Chao Zhang, Chung-Cheng Chiu, Qiujia Li, Junwen Bai, Tara N. Sainath, Philip C. Woodland
The evidential uncertainty measure is extended to quantify the uncertainty in emotion distribution estimation.
no code implementations • 19 Feb 2024 • Nineli Lashkarashvili, Wen Wu, Guangzhi Sun, Philip C. Woodland
Foundation models have shown superior performance for speech emotion recognition (SER).
no code implementations • 14 Dec 2023 • Keqi Deng, Philip C. Woodland
Recently, connectionist temporal classification (CTC)-based end-to-end (E2E) automatic speech recognition (ASR) models have achieved impressive results, especially with the development of self-supervised learning.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 19 Nov 2023 • Keqi Deng, Philip C. Woodland
An Auto-regressive Integrate-and-Fire (AIF) mechanism is proposed to generate the label-level encoder representation while retaining low latency operation that can be used for streaming.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 13 Nov 2023 • Guangzhi Sun, Shutong Feng, Dongcheng Jiang, Chao Zhang, Milica Gašić, Philip C. Woodland
Recently, advancements in large language models (LLMs) have shown an unprecedented ability across various language tasks.
1 code implementation • 30 Sep 2023 • Wen Wu, Wenlin Chen, Chao Zhang, Philip C. Woodland
Human annotator simulation (HAS) serves as a cost-effective substitute for human evaluation such as data annotation and system assessment.
no code implementations • 25 Aug 2023 • Keqi Deng, Philip C. Woodland
Although end-to-end (E2E) trainable automatic speech recognition (ASR) has shown great success by jointly learning acoustic and linguistic information, it still suffers from the effect of domain shifts, thus limiting potential applications.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
1 code implementation • 14 Aug 2023 • Wen Wu, Chao Zhang, Philip C. Woodland
Two metrics are proposed to evaluate AER performance with automatic segmentation based on time-weighted emotion and speaker classification errors.
no code implementations • 6 Jul 2023 • Keqi Deng, Philip C. Woodland
Hence blank tokens are no longer needed and the prediction network can be easily adapted using text data.
no code implementations • 4 Jul 2023 • Guangzhi Sun, Chao Zhang, Ivan Vulić, Paweł Budzianowski, Philip C. Woodland
In this work, we propose a Knowledge-Aware Audio-Grounded generative slot-filling framework, termed KA2G, that focuses on few-shot and zero-shot slot filling for ToD with speech input.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +6
1 code implementation • 11 Jun 2023 • Wen Wu, Chao Zhang, Philip C. Woodland
In automatic emotion recognition (AER), labels assigned by different human annotators to the same utterance are often inconsistent due to the inherent complexity of emotion and the subjectivity of perception.
1 code implementation • 2 Jun 2023 • Guangzhi Sun, Xianrui Zheng, Chao Zhang, Philip C. Woodland
End-to-end automatic speech recognition (ASR) and large language models, such as Whisper and GPT-2, have recently been scaled to use vast amounts of training data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 20 May 2023 • Wen Wu, Chao Zhang, Philip C. Woodland
This paper proposes handling training data sparsity in speech-based automatic depression detection (SDD) using foundation models pre-trained with self-supervised learning (SSL).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • 3 Apr 2023 • Yuang Li, Xianrui Zheng, Philip C. Woodland
In this paper, seven SSL models were compared on both simulated and real-world corpora.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 20 Mar 2023 • Xiaoyu Yang, Qiujia Li, Chao Zhang, Philip C. Woodland
The performance of the student model can be further enhanced when multiple teachers are used jointly, achieving word error rate reductions (WERRs) of 17. 5% and 10. 6%.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 16 Feb 2023 • Keqi Deng, Philip C. Woodland
End-to-end (E2E) automatic speech recognition (ASR) implicitly learns the token sequence distribution of paired audio-transcript training data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 9 Nov 2022 • Wen Wu, Chao Zhang, Philip C. Woodland
Automatic emotion recognition in conversation (ERC) is crucial for emotion-aware conversational artificial intelligence.
no code implementations • 4 Nov 2022 • Florian L. Kreyssig, Yangyang Shi, Jinxi Guo, Leda Sari, Abdelrahman Mohamed, Philip C. Woodland
Furthermore, this paper proposes a variant of MPPT that allows low-footprint streaming models to be trained effectively by computing the MPPT loss on masked and unmasked frames.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 29 Oct 2022 • Guangzhi Sun, Chao Zhang, Philip C. Woodland
Specifically, a tree-constrained pointer generator (TCPGen), a powerful and efficient biasing model component, is studied, which leverages a slot shortlist with corresponding entities to extract biasing lists.
no code implementations • 8 Jul 2022 • Xianrui Zheng, Chao Zhang, Philip C. Woodland
Self-supervised-learning-based pre-trained models for speech data, such as Wav2Vec 2. 0 (W2V2), have become the backbone of many speech tasks.
no code implementations • 2 Jul 2022 • Guangzhi Sun, Chao Zhang, Philip C. Woodland
Incorporating biasing words obtained as contextual knowledge is critical for many automatic speech recognition (ASR) applications.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 8 Mar 2022 • Wen Wu, Chao Zhang, Xixin Wu, Philip C. Woodland
In this paper, a novel Bayesian training loss based on per-utterance Dirichlet prior distributions is proposed for verbal emotion recognition, which models the uncertainty in one-hot labels created when human annotators assign the same utterance to different emotion classes.
no code implementations • 7 Oct 2021 • Xiaoyu Yang, Qiujia Li, Philip C. Woodland
Self-supervised pre-training is an effective approach to leveraging a large amount of unlabelled data to reduce word error rates (WERs) of automatic speech recognition (ASR) systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 7 Oct 2021 • Qiujia Li, Yu Zhang, David Qiu, Yanzhang He, Liangliang Cao, Philip C. Woodland
As end-to-end automatic speech recognition (ASR) models reach promising performance, various downstream tasks rely on good confidence estimators for these systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 1 Sep 2021 • Guangzhi Sun, Chao Zhang, Philip C. Woodland
Contextual knowledge is important for real-world automatic speech recognition (ASR) applications.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 29 Jul 2021 • Xianrui Zheng, Chao Zhang, Philip C. Woodland
Furthermore, on the AMI corpus, the proposed conversion for language prior probabilities enables BERT to obtain an extra 3% relative WERR, and the combination of BERT, GPT and GPT-2 results in further improvements.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 1 Jul 2021 • Qiujia Li, Chao Zhang, Philip C. Woodland
Commonly used automatic speech recognition (ASR) systems can be classified into frame-synchronous and label-synchronous categories, based on whether the speech is decoded on a per-frame or per-label basis.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 25 Mar 2021 • Qiujia Li, Yu Zhang, Bo Li, Liangliang Cao, Philip C. Woodland
End-to-end models with auto-regressive decoders have shown impressive results for automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 12 Mar 2021 • Adnan Haider, Chao Zhang, Florian L. Kreyssig, Philip C. Woodland
This paper presents a novel natural gradient and Hessian-free (NGHF) optimisation framework for neural network training that can operate efficiently in a distributed manner.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 27 Oct 2020 • Wen Wu, Chao Zhang, Philip C. Woodland
In this paper, a novel two-branch neural network model structure is proposed for multimodal emotion recognition, which consists of a time synchronous branch (TSB) and a time asynchronous branch (TAB).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +6
1 code implementation • 22 Oct 2020 • Qiujia Li, David Qiu, Yu Zhang, Bo Li, Yanzhang He, Philip C. Woodland, Liangliang Cao, Trevor Strohman
For various speech-related tasks, confidence scores from a speech recogniser are a useful measure to assess the quality of transcriptions.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 10 Nov 2019 • Yassir Fathullah, Chao Zhang, Philip C. Woodland
Speaker diarisation systems nowadays use embeddings generated from speech segments in a bottleneck layer, which are needed to be discriminative for unseen speakers.
1 code implementation • 22 Oct 2019 • Qiujia Li, Florian L. Kreyssig, Chao Zhang, Philip C. Woodland
In this paper, we propose Discriminative Neural Clustering (DNC) that formulates data clustering with a maximum number of clusters as a supervised sequence-to-sequence learning problem.
no code implementations • 14 Sep 2019 • Qiujia Li, Chao Zhang, Philip C. Woodland
This paper proposes a novel automatic speech recognition (ASR) framework called Integrated Source-Channel and Attention (ISCA) that combines the advantages of traditional systems based on the noisy source-channel model (SC) and end-to-end style systems using attention-based sequence-to-sequence models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 6 Apr 2018 • Adnan Haider, Philip C. Woodland
Deep Neural Network (DNN) acoustic models often use discriminative sequence training that optimises an objective function that better approximates the word error rate (WER) than frame-based training.
2 code implementations • 2 Oct 2016 • Yanmin Qian, Philip C. Woodland
On the Aurora 4 task, the very deep CNN achieves a WER of 8. 81%, further 7. 99% with auxiliary feature joint training, and 7. 09% with LSTM-RNN joint decoding.