Browse SoTA > Speech > Speech Recognition > Visual Speech Recognition

Visual Speech Recognition

7 papers with code · Speech
Subtask of Speech Recognition

Benchmarks

You can find evaluation results in the subtasks. You can also submitting evaluation metrics for this task.

Greatest papers with code

Combining Residual Networks with LSTMs for Lipreading

12 Mar 2017mpc001/end-to-end-Lipreading

We propose an end-to-end deep learning architecture for word-level visual speech recognition.

LIPREADING LIP READING VISUAL SPEECH RECOGNITION

LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild

16 Oct 2018Fengdalu/Lipreading-DenseNet3D

It has shown a large variation in this benchmark in several aspects, including the number of samples in each class, video resolution, lighting conditions, and speakers' attributes such as pose, age, gender, and make-up.

LIPREADING LIP READING VISUAL SPEECH RECOGNITION

Deep word embeddings for visual speech recognition

30 Oct 2017tstafylakis/Lipreading-ResNet

In this paper we present a deep learning architecture for extracting word embeddings for visual speech recognition.

LIPREADING VISUAL SPEECH RECOGNITION WORD EMBEDDINGS

How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition

17 Apr 2020georgesterpu/Sigmedia-AVSR

A recently proposed multimodal fusion strategy, AV Align, based on state-of-the-art sequence to sequence neural networks, attempts to model this relationship by explicitly aligning the acoustic and visual representations of speech.

AUDIO-VISUAL SPEECH RECOGNITION VISUAL SPEECH RECOGNITION

Should we hard-code the recurrence concept or learn it instead ? Exploring the Transformer architecture for Audio-Visual Speech Recognition

19 May 2020georgesterpu/Taris

The audio-visual speech fusion strategy AV Align has shown significant performance improvements in audio-visual speech recognition (AVSR) on the challenging LRS2 dataset.

AUDIO-VISUAL SPEECH RECOGNITION VISUAL SPEECH RECOGNITION

Harnessing GANs for Zero-shot Learning of New Classes in Visual Speech Recognition

29 Jan 2019midas-research/DECA

To solve this problem, we present a novel approach to zero-shot learning by generating new classes using Generative Adversarial Networks (GANs), and show how the addition of unseen class samples increases the accuracy of a VSR system by a significant margin of 27% and allows it to handle speaker-independent out-of-vocabulary phrases.

VISUAL SPEECH RECOGNITION ZERO-SHOT LEARNING

Deep Audio-Visual Speech Recognition

6 Sep 2018amitai1992/AutomatedLipReading

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio.

 Ranked #1 on Lipreading on LRS2 (using extra training data)

AUDIO-VISUAL SPEECH RECOGNITION LIPREADING LIP READING VISUAL SPEECH RECOGNITION