Visual Speech Recognition

40 papers with code • 2 benchmarks • 5 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Visual Speech Recognition

Trend	Dataset	Best Model	Paper	Code	Compare
	LRS3-TED	CTC/Attention			See all
	LRS2	VTP with more data			See all

Datasets

Subtasks

Lip to Speech Synthesis

Latest papers with no code

Most implemented Social Latest No code

XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception

no code yet • 21 Mar 2024

It is designed to maximize the benefits of limited multilingual AV pre-training data, by building on top of audio-only multilingual pre-training and simplifying existing pre-training schemes.

Paper
Add Code

JEP-KD: Joint-Embedding Predictive Architecture Based Knowledge Distillation for Visual Speech Recognition

no code yet • 4 Mar 2024

Visual Speech Recognition (VSR) tasks are generally recognized to have a lower theoretical performance ceiling than Automatic Speech Recognition (ASR), owing to the inherent limitations of conveying semantic information visually.

Paper
Add Code

Comparison of Conventional Hybrid and CTC/Attention Decoders for Continuous Visual Speech Recognition

no code yet • 20 Feb 2024

Thanks to the rise of deep learning and the availability of large-scale audio-visual databases, recent advances have been achieved in Visual Speech Recognition (VSR).

Paper
Add Code

It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

no code yet • 8 Feb 2024

Recent studies have successfully shown that large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output.

Paper
Add Code

Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units

no code yet • 18 Jan 2024

By using the visual speech units as the inputs of our system, we pre-train the model to predict corresponding text outputs on massive multilingual data constructed by merging several VSR databases.

Paper
Add Code

SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition

no code yet • 18 Jan 2024

Audio-visual speech recognition (AVSR) is a multimodal extension of automatic speech recognition (ASR), using video as a complement to audio.

Paper
Add Code

MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition

no code yet • 7 Jan 2024

While automatic speech recognition (ASR) systems degrade significantly in noisy environments, audio-visual speech recognition (AVSR) systems aim to complement the audio stream with noise-invariant visual cues and improve the system's robustness.

Paper
Add Code

LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data

no code yet • 15 Dec 2023

This paper proposes a novel, resource-efficient approach to Visual Speech Recognition (VSR) leveraging speech representations produced by any trained Automatic Speech Recognition (ASR) model.

Paper
Add Code

The GUA-Speech System Description for CNVSRC Challenge 2023

no code yet • 12 Dec 2023

This study describes our system for Task 1 Single-speaker Visual Speech Recognition (VSR) fixed track in the Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) 2023.

Paper
Add Code

Speaker-Adapted End-to-End Visual Speech Recognition for Continuous Spanish

no code yet • 21 Nov 2023

Different studies have shown the importance of visual cues throughout the speech perception process.

Paper
Add Code

Visual Speech Recognition

Benchmarks Add a Result

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result