Robust Speech Recognition
22 papers with code • 0 benchmarks • 3 datasets
Benchmarks
These leaderboards are used to track progress in Robust Speech Recognition
Latest papers with no code
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
It is designed to maximize the benefits of limited multilingual AV pre-training data, by building on top of audio-only multilingual pre-training and simplifying existing pre-training schemes.
KinSPEAK: Improving speech recognition for Kinyarwanda via semi-supervised learning methods
In this work, we show that using self-supervised pre-training, following a simple curriculum schedule during fine-tuning and using semi-supervised learning to leverage large unlabelled speech data significantly improve speech recognition performance for Kinyarwanda.
The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios
The CHiME challenges have played a significant role in the development and evaluation of robust automatic speech recognition (ASR) systems.
Statistical Beamformer Exploiting Non-stationarity and Sparsity with Spatially Constrained ICA for Robust Speech Recognition
In this paper, we present a statistical beamforming algorithm as a pre-processing step for robust automatic speech recognition (ASR).
RescueSpeech: A German Corpus for Speech Recognition in Search and Rescue Domain
Despite the recent advancements in speech recognition, there are still difficulties in accurately transcribing conversational and emotional speech in noisy and reverberant acoustic environments.
Incorporating L2 Phonemes Using Articulatory Features for Robust Speech Recognition
Furthermore, fine-tuning on L2 speech improves recognition accuracy for both L1 and L2 speech without performance trade-offs.
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
(ii) We also introduce a simple curriculum scheme during training which we show is crucial to enable the model to jointly process audio and visual information effectively; and finally (iii) we show that our model achieves state of the art zero-shot results on three different AV-ASR benchmarks (How2, VisSpeech and Ego4D), while also crucially preserving decent performance on traditional audio-only speech recognition benchmarks (LibriSpeech).
pMCT: Patched Multi-Condition Training for Robust Speech Recognition
For analyses on robust ASR, we employed pMCT on the VOiCES dataset which is a noisy reverberant dataset created using utterances from LibriSpeech.
On monoaural speech enhancement for automatic recognition of real noisy speech using mixture invariant training
In this paper, we explore an improved framework to train a monoaural neural enhancement model for robust speech recognition.
End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation
This work presents our end-to-end (E2E) automatic speech recognition (ASR) model targetting at robust speech recognition, called Integraded speech Recognition with enhanced speech Input for Self-supervised learning representation (IRIS).