Lip Reading

46 papers with code • 3 benchmarks • 5 datasets

Lip Reading is a task to infer the speech content in a video by using only the visual information, especially the lip movements. It has many crucial applications in practice, such as assisting audio-based speech recognition, biometric authentication and aiding hearing-impaired people.

Source: Mutual Information Maximization for Effective Lip Reading

Benchmarks

Add a Result

These leaderboards are used to track progress in Lip Reading

Dataset	Best Model	Compare
GRID corpus (mixed-speech)	Lip2Wav	See all
TCD-TIMIT corpus (mixed-speech)	Lip2Wav	See all
LRW	Lip2Wav	See all

Datasets

Subtasks

Lip password classification

Most implemented papers

Most implemented Social Latest No code

Seeing wake words: Audio-visual Keyword Spotting

lilianemomeni/KWS-Net • • 2 Sep 2020

The goal of this work is to automatically determine whether and when a word of interest is spoken by a talking face, with or without the audio.

Paper
Code

Lip-reading with Densely Connected Temporal Convolutional Networks

mpc001/Lipreading_using_Temporal_Convolutional_Networks • • 29 Sep 2020

In this work, we present the Densely Connected Temporal Convolutional Network (DC-TCN) for lip-reading of isolated words.

Paper
Code

Learn an Effective Lip Reading Model without Pains

Fengdalu/learn-an-effective-lip-reading-model-without-pains • • 15 Nov 2020

Considering the non-negligible effects of these strategies and the existing tough status to train an effective lip reading model, we perform a comprehensive quantitative study and comparative analysis, for the first time, to show the effects of several different choices for lip reading.

Paper
Code

Contrastive Learning of Global-Local Video Representations

yunyikristy/global_local • • 7 Apr 2021

In this work, we propose to learn video representations that generalize to both the tasks which require global semantic information (e. g., classification) and the tasks that require local fine-grained spatio-temporal information (e. g., localization).

Paper
Code

Multi-Perspective LSTM for Joint Visual Representation Learning

arsm/MPLSTM • • CVPR 2021

We validate the performance of our proposed architecture in the context of two multi-perspective visual recognition tasks namely lip reading and face recognition.

Paper
Code

Selective Listening by Synchronizing Speech with Lips

zexupan/reentry • • 14 Jun 2021

A speaker extraction algorithm seeks to extract the speech of a target speaker from a multi-talker speech mixture when given a cue that represents the target speaker, such as a pre-enrolled speech utterance, or an accompanying video track.

Paper
Code

Visual Keyword Spotting with Attention

prajwalkr/transpotter • • 29 Oct 2021

In this paper, we consider the task of spotting spoken keywords in silent video sequences -- also known as visual keyword spotting.

Paper
Code

Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition

lumia-group/leveraging-self-supervised-learning-for-avsr • • ACL 2022

In particular, audio and visual front-ends are trained on large-scale unimodal datasets, then we integrate components of both front-ends into a larger multimodal framework which learns to recognize parallel audio-visual data into characters through a combination of CTC and seq2seq decoding.

Paper
Code

Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video

ms-dot-k/Visual-Audio-Memory • • ICCV 2021

By learning the interrelationship through the associative bridge, the proposed bridging framework is able to obtain the target modal representations inside the memory network, even with the source modal input only, and it provides rich information for its downstream tasks.

Paper
Code

Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading

ms-dot-k/Multi-head-Visual-Audio-Memory • • The AAAI Conference on Artificial Intelligence (AAAI) 2022

With the multi-head key memories, MVM extracts possible candidate audio features from the memory, which allows the lip reading model to consider the possibility of which pronunciations can be represented from the input lip movement.

Paper
Code

Lip Reading

Benchmarks Add a Result

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result