Lip Reading

45 papers with code • 3 benchmarks • 5 datasets

Lip Reading is a task to infer the speech content in a video by using only the visual information, especially the lip movements. It has many crucial applications in practice, such as assisting audio-based speech recognition, biometric authentication and aiding hearing-impaired people.

Source: Mutual Information Maximization for Effective Lip Reading

Benchmarks

Add a Result

These leaderboards are used to track progress in Lip Reading

Dataset	Best Model	Compare
GRID corpus (mixed-speech)	Lip2Wav	See all
TCD-TIMIT corpus (mixed-speech)	Lip2Wav	See all
LRW	Lip2Wav	See all

Datasets

Subtasks

Lip password classification

Most implemented papers

Most implemented Social Latest No code

Combining Residual Networks with LSTMs for Lipreading

tstafylakis/Lipreading-ResNet • • 12 Mar 2017

We propose an end-to-end deep learning architecture for word-level visual speech recognition.

Paper
Code

Deep Audio-Visual Speech Recognition

lordmartian/deep_avsr • • 6 Sep 2018

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio.

Paper
Code

End-to-end Audio-visual Speech Recognition with Conformers

zziz/pwc • • 12 Feb 2021

In this work, we present a hybrid CTC/Attention model based on a ResNet-18 and Convolution-augmented transformer (Conformer), that can be trained in an end-to-end manner.

Paper
Code

LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild

Fengdalu/Lipreading-DenseNet3D • • 16 Oct 2018

It has shown a large variation in this benchmark in several aspects, including the number of samples in each class, video resolution, lighting conditions, and speakers' attributes such as pose, age, gender, and make-up.

Paper
Code

Lipreading using Temporal Convolutional Networks

mpc001/Lipreading_using_Temporal_Convolutional_Networks • • 23 Jan 2020

We present results on the largest publicly-available datasets for isolated word recognition in English and Mandarin, LRW and LRW1000, respectively.

Paper
Code

AuthNet: A Deep Learning based Authentication Mechanism using Temporal Facial Feature Movements

Mohit-Mithra/Face-Recognition-Systems-with-lip-movement-pattern • • 4 Dec 2020

Biometric systems based on Machine learning and Deep learning are being extensively used as authentication mechanisms in resource-constrained environments like smartphones and other small computing devices.

Paper
Code

Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction

facebookresearch/av_hubert • • ICLR 2022

The lip-reading WER is further reduced to 26. 9% when using all 433 hours of labeled data from LRS3 and combined with self-training.

Paper
Code

MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition

exgc/avmust-ted • ICCV 2023

However, despite researchers exploring cross-lingual translation techniques such as machine translation and audio speech translation to overcome language barriers, there is still a shortage of cross-lingual studies on visual speech.

Paper
Code