Visual Speech Recognition
40 papers with code • 2 benchmarks • 5 datasets
Latest papers with no code
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
It is designed to maximize the benefits of limited multilingual AV pre-training data, by building on top of audio-only multilingual pre-training and simplifying existing pre-training schemes.
JEP-KD: Joint-Embedding Predictive Architecture Based Knowledge Distillation for Visual Speech Recognition
Visual Speech Recognition (VSR) tasks are generally recognized to have a lower theoretical performance ceiling than Automatic Speech Recognition (ASR), owing to the inherent limitations of conveying semantic information visually.
Comparison of Conventional Hybrid and CTC/Attention Decoders for Continuous Visual Speech Recognition
Thanks to the rise of deep learning and the availability of large-scale audio-visual databases, recent advances have been achieved in Visual Speech Recognition (VSR).
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
Recent studies have successfully shown that large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output.
Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units
By using the visual speech units as the inputs of our system, we pre-train the model to predict corresponding text outputs on massive multilingual data constructed by merging several VSR databases.
SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition
Audio-visual speech recognition (AVSR) is a multimodal extension of automatic speech recognition (ASR), using video as a complement to audio.
MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition
While automatic speech recognition (ASR) systems degrade significantly in noisy environments, audio-visual speech recognition (AVSR) systems aim to complement the audio stream with noise-invariant visual cues and improve the system's robustness.
LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data
This paper proposes a novel, resource-efficient approach to Visual Speech Recognition (VSR) leveraging speech representations produced by any trained Automatic Speech Recognition (ASR) model.
The GUA-Speech System Description for CNVSRC Challenge 2023
This study describes our system for Task 1 Single-speaker Visual Speech Recognition (VSR) fixed track in the Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) 2023.
Speaker-Adapted End-to-End Visual Speech Recognition for Continuous Spanish
Different studies have shown the importance of visual cues throughout the speech perception process.