Search Results for author: Philip C. Woodland

Found 36 papers, 10 papers with code

Handling Ambiguity in Emotion: From Out-of-Domain Detection to Distribution Estimation

no code implementations • 20 Feb 2024 • Wen Wu, Bo Li, Chao Zhang, Chung-Cheng Chiu, Qiujia Li, Junwen Bai, Tara N. Sainath, Philip C. Woodland

The evidential uncertainty measure is extended to quantify the uncertainty in emotion distribution estimation.

Classification Emotion Classification

Paper
Add Code

Parameter Efficient Finetuning for Speech Emotion Recognition and Domain Adaptation

no code implementations • 19 Feb 2024 • Nineli Lashkarashvili, Wen Wu, Guangzhi Sun, Philip C. Woodland

Foundation models have shown superior performance for speech emotion recognition (SER).

Cross-corpus Domain Adaptation +1

Paper
Add Code

FastInject: Injecting Unpaired Text Data into CTC-based ASR training

no code implementations • 14 Dec 2023 • Keqi Deng, Philip C. Woodland

Recently, connectionist temporal classification (CTC)-based end-to-end (E2E) automatic speech recognition (ASR) models have achieved impressive results, especially with the development of self-supervised learning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Label-Synchronous Neural Transducer for Adaptable Online E2E Speech Recognition

no code implementations • 19 Nov 2023 • Keqi Deng, Philip C. Woodland

An Auto-regressive Integrate-and-Fire (AIF) mechanism is proposed to generate the label-level encoder representation while retaining low latency operation that can be used for streaming.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Speech-based Slot Filling using Large Language Models

no code implementations • 13 Nov 2023 • Guangzhi Sun, Shutong Feng, Dongcheng Jiang, Chao Zhang, Milica Gašić, Philip C. Woodland

Recently, advancements in large language models (LLMs) have shown an unprecedented ability across various language tasks.

In-Context Learning slot-filling +1

Paper
Add Code

It HAS to be Subjective: Human Annotator Simulation via Zero-shot Density Estimation

1 code implementation • 30 Sep 2023 • Wen Wu, Wenlin Chen, Chao Zhang, Philip C. Woodland

Human annotator simulation (HAS) serves as a cost-effective substitute for human evaluation such as data annotation and system assessment.

Density Estimation Meta-Learning

Paper
Code

Decoupled Structure for Improved Adaptability of End-to-End Models

no code implementations • 25 Aug 2023 • Keqi Deng, Philip C. Woodland

Although end-to-end (E2E) trainable automatic speech recognition (ASR) has shown great success by jointly learning acoustic and linguistic information, it still suffers from the effect of domain shifts, thus limiting potential applications.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for Conversations

1 code implementation • 14 Aug 2023 • Wen Wu, Chao Zhang, Philip C. Woodland

Two metrics are proposed to evaluate AER performance with automatic segmentation based on time-weighted emotion and speaker classification errors.

Action Detection Activity Detection +4

Paper
Code

Label-Synchronous Neural Transducer for End-to-End ASR

no code implementations • 6 Jul 2023 • Keqi Deng, Philip C. Woodland

Hence blank tokens are no longer needed and the prediction network can be easily adapted using text data.

Domain Adaptation

Paper
Add Code

Knowledge-Aware Audio-Grounded Generative Slot Filling for Limited Annotated Data

no code implementations • 4 Jul 2023 • Guangzhi Sun, Chao Zhang, Ivan Vulić, Paweł Budzianowski, Philip C. Woodland

In this work, we propose a Knowledge-Aware Audio-Grounded generative slot-filling framework, termed KA2G, that focuses on few-shot and zero-shot slot filling for ToD with speech input.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Paper
Add Code

Estimating the Uncertainty in Emotion Attributes using Deep Evidential Regression

1 code implementation • 11 Jun 2023 • Wen Wu, Chao Zhang, Philip C. Woodland

In automatic emotion recognition (AER), labels assigned by different human annotators to the same utterance are often inconsistent due to the inherent complexity of emotion and the subjectivity of perception.

Attribute Emotion Recognition +1

Paper
Code

Can Contextual Biasing Remain Effective with Whisper and GPT-2?

1 code implementation • 2 Jun 2023 • Guangzhi Sun, Xianrui Zheng, Chao Zhang, Philip C. Woodland

End-to-end automatic speech recognition (ASR) and large language models, such as Whisper and GPT-2, have recently been scaled to use vast amounts of training data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

Self-supervised representations in speech-based depression detection

no code implementations • 20 May 2023 • Wen Wu, Chao Zhang, Philip C. Woodland

This paper proposes handling training data sparsity in speech-based automatic depression detection (SDD) using foundation models pre-trained with self-supervised learning (SSL).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

Self-Supervised Learning-Based Source Separation for Meeting Data

no code implementations • 3 Apr 2023 • Yuang Li, Xianrui Zheng, Philip C. Woodland

In this paper, seven SSL models were compared on both simulated and real-world corpora.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Knowledge Distillation from Multiple Foundation Models for End-to-End Speech Recognition

no code implementations • 20 Mar 2023 • Xiaoyu Yang, Qiujia Li, Chao Zhang, Philip C. Woodland

The performance of the student model can be further enhanced when multiple teachers are used jointly, achieving word error rate reductions (WERRs) of 17. 5% and 10. 6%.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Adaptable End-to-End ASR Models using Replaceable Internal LMs and Residual Softmax

no code implementations • 16 Feb 2023 • Keqi Deng, Philip C. Woodland

End-to-end (E2E) automatic speech recognition (ASR) implicitly learns the token sequence distribution of paired audio-transcript training data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Distribution-based Emotion Recognition in Conversation

1 code implementation • 9 Nov 2022 • Wen Wu, Chao Zhang, Philip C. Woodland

Automatic emotion recognition in conversation (ERC) is crucial for emotion-aware conversational artificial intelligence.

Emotion Recognition in Conversation

Paper
Code

Biased Self-supervised learning for ASR

no code implementations • 4 Nov 2022 • Florian L. Kreyssig, Yangyang Shi, Jinxi Guo, Leda Sari, Abdelrahman Mohamed, Philip C. Woodland

Furthermore, this paper proposes a variant of MPPT that allows low-footprint streaming models to be trained effectively by computing the MPPT loss on masked and unmasked frames.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

End-to-end Spoken Language Understanding with Tree-constrained Pointer Generator

1 code implementation • 29 Oct 2022 • Guangzhi Sun, Chao Zhang, Philip C. Woodland

Specifically, a tree-constrained pointer generator (TCPGen), a powerful and efficient biasing model component, is studied, which leverages a slot shortlist with corresponding entities to extract biasing lists.

intent-classification Intent Classification +6

Paper
Code

Tandem Multitask Training of Speaker Diarisation and Speech Recognition for Meeting Transcription

no code implementations • 8 Jul 2022 • Xianrui Zheng, Chao Zhang, Philip C. Woodland

Self-supervised-learning-based pre-trained models for speech data, such as Wav2Vec 2. 0 (W2V2), have become the backbone of many speech tasks.

Action Detection Activity Detection +3

Paper
Add Code

Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition

no code implementations • 2 Jul 2022 • Guangzhi Sun, Chao Zhang, Philip C. Woodland

Incorporating biasing words obtained as contextual knowledge is critical for many automatic speech recognition (ASR) applications.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Estimating the Uncertainty in Emotion Class Labels with Utterance-Specific Dirichlet Priors

no code implementations • 8 Mar 2022 • Wen Wu, Chao Zhang, Xixin Wu, Philip C. Woodland

In this paper, a novel Bayesian training loss based on per-utterance Dirichlet prior distributions is proposed for verbal emotion recognition, which models the uncertainty in one-hot labels created when human annotators assign the same utterance to different emotion classes.

Attribute Emotion Classification +1

Paper
Add Code

Knowledge Distillation for Neural Transducers from Large Self-Supervised Pre-trained Models

no code implementations • 7 Oct 2021 • Xiaoyu Yang, Qiujia Li, Philip C. Woodland

Self-supervised pre-training is an effective approach to leveraging a large amount of unlabelled data to reduce word error rates (WERs) of automatic speech recognition (ASR) systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition

no code implementations • 7 Oct 2021 • Qiujia Li, Yu Zhang, David Qiu, Yanzhang He, Liangliang Cao, Philip C. Woodland

As end-to-end automatic speech recognition (ASR) models reach promising performance, various downstream tasks rely on good confidence estimators for these systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Tree-constrained Pointer Generator for End-to-end Contextual Speech Recognition

no code implementations • 1 Sep 2021 • Guangzhi Sun, Chao Zhang, Philip C. Woodland

Contextual knowledge is important for real-world automatic speech recognition (ASR) applications.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Adapting GPT, GPT-2 and BERT Language Models for Speech Recognition

no code implementations • 29 Jul 2021 • Xianrui Zheng, Chao Zhang, Philip C. Woodland

Furthermore, on the AMI corpus, the proposed conversion for language prior probabilities enables BERT to obtain an extra 3% relative WERR, and the combination of BERT, GPT and GPT-2 results in further improvements.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Combining Frame-Synchronous and Label-Synchronous Systems for Speech Recognition

1 code implementation • 1 Jul 2021 • Qiujia Li, Chao Zhang, Philip C. Woodland

Commonly used automatic speech recognition (ASR) systems can be classified into frame-synchronous and label-synchronous categories, based on whether the speech is decoded on a per-frame or per-label basis.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Residual Energy-Based Models for End-to-End Speech Recognition

no code implementations • 25 Mar 2021 • Qiujia Li, Yu Zhang, Bo Li, Liangliang Cao, Philip C. Woodland

End-to-end models with auto-regressive decoders have shown impressive results for automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

A Distributed Optimisation Framework Combining Natural Gradient with Hessian-Free for Discriminative Sequence Training

no code implementations • 12 Mar 2021 • Adnan Haider, Chao Zhang, Florian L. Kreyssig, Philip C. Woodland

This paper presents a novel natural gradient and Hessian-free (NGHF) optimisation framework for neural network training that can operate efficiently in a distributed manner.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Emotion recognition by fusing time synchronous and time asynchronous representations

no code implementations • 27 Oct 2020 • Wen Wu, Chao Zhang, Philip C. Woodland

In this paper, a novel two-branch neural network model structure is proposed for multimodal emotion recognition, which consists of a time synchronous branch (TSB) and a time asynchronous branch (TAB).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Paper
Add Code

Confidence Estimation for Attention-based Sequence-to-sequence Models for Speech Recognition

1 code implementation • 22 Oct 2020 • Qiujia Li, David Qiu, Yu Zhang, Bo Li, Yanzhang He, Philip C. Woodland, Liangliang Cao, Trevor Strohman

For various speech-related tasks, confidence scores from a speech recogniser are a useful measure to assess the quality of transcriptions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Improved Large-margin Softmax Loss for Speaker Diarisation

no code implementations • 10 Nov 2019 • Yassir Fathullah, Chao Zhang, Philip C. Woodland

Speaker diarisation systems nowadays use embeddings generated from speech segments in a bottleneck layer, which are needed to be discriminative for unseen speakers.

Paper
Add Code

Discriminative Neural Clustering for Speaker Diarisation

1 code implementation • 22 Oct 2019 • Qiujia Li, Florian L. Kreyssig, Chao Zhang, Philip C. Woodland

In this paper, we propose Discriminative Neural Clustering (DNC) that formulates data clustering with a maximum number of clusters as a supervised sequence-to-sequence learning problem.

Clustering Data Augmentation

Paper
Code

Integrating Source-channel and Attention-based Sequence-to-sequence Models for Speech Recognition

no code implementations • 14 Sep 2019 • Qiujia Li, Chao Zhang, Philip C. Woodland

This paper proposes a novel automatic speech recognition (ASR) framework called Integrated Source-Channel and Attention (ISCA) that combines the advantages of traditional systems based on the noisy source-channel model (SC) and end-to-end style systems using attention-based sequence-to-sequence models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Sequence Training of DNN Acoustic Models With Natural Gradient

no code implementations • 6 Apr 2018 • Adnan Haider, Philip C. Woodland

Deep Neural Network (DNN) acoustic models often use discriminative sequence training that optimises an objective function that better approximates the word error rate (WER) than frame-based training.

Computational Efficiency

Paper
Add Code

Very Deep Convolutional Neural Networks for Robust Speech Recognition

2 code implementations • 2 Oct 2016 • Yanmin Qian, Philip C. Woodland

On the Aurora 4 task, the very deep CNN achieves a WER of 8. 81%, further 7. 99% with auxiliary feature joint training, and 7. 09% with LSTM-RNN joint decoding.

Robust Speech Recognition speech-recognition

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.