Lip Reading

43 papers with code • 3 benchmarks • 5 datasets

Lip Reading is a task to infer the speech content in a video by using only the visual information, especially the lip movements. It has many crucial applications in practice, such as assisting audio-based speech recognition, biometric authentication and aiding hearing-impaired people.

Source: Mutual Information Maximization for Effective Lip Reading

Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realism

g-milis/NEUTART 11 Dec 2023

Our method, which we call NEUral Text to ARticulate Talk (NEUTART), is a talking face generator that uses a joint audiovisual feature space, as well as speech-informed 3D facial reconstructions and a lip-reading loss for visual supervision.

19
11 Dec 2023

Do VSR Models Generalize Beyond LRS3?

yasserdahouml/vsr_test_set 23 Nov 2023

The Lip Reading Sentences-3 (LRS3) benchmark has primarily been the focus of intense research in visual speech recognition (VSR) during the last few years.

5
23 Nov 2023

Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading

jinchiniao/LSHUC 8 Oct 2023

For deep layers where both the speaker's features and the speech content features are all expressed well, we introduce the speaker-adaptive features to learn for suppressing the speech content irrelevant noise for robust lip reading.

4
08 Oct 2023

SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces

psyai-net/SelfTalk_release 19 Jun 2023

To enhance the visual accuracy of generated lip movement while reducing the dependence on labeled data, we propose a novel framework SelfTalk, by involving self-supervision in a cross-modals network system to learn 3D talking faces.

104
19 Jun 2023

OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment

exgc/opensr 10 Jun 2023

We demonstrate that OpenSR enables modality transfer from one to any in three different settings (zero-, few- and full-shot), and achieves highly competitive zero-shot performance compared to the existing few-shot and full-shot lip-reading methods.

14
10 Jun 2023

LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading

yochaiye/LipVoicer 5 Jun 2023

We then condition a diffusion model on the video and use the extracted text through a classifier-guidance mechanism where a pre-trained ASR serves as the classifier.

7
05 Jun 2023

A Novel Interpretable and Generalizable Re-synchronization Model for Cued Speech based on a Multi-Cuer Corpus

lufei321/resync-cs 5 Jun 2023

Cued Speech (CS) is a multi-modal visual coding system combining lip reading with several hand cues at the phonetic level to make the spoken language visible to the hearing impaired.

0
05 Jun 2023

Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert

sxjdwang/talklip CVPR 2023

To address the problem, we propose using a lip-reading expert to improve the intelligibility of the generated lip regions by penalizing the incorrect generation results.

338
29 Mar 2023

MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition

rongjiehuang/transpeech ICCV 2023

However, despite researchers exploring cross-lingual translation techniques such as machine translation and audio speech translation to overcome language barriers, there is still a shortage of cross-lingual studies on visual speech.

155
09 Mar 2023

GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis

yerfor/geneface 31 Jan 2023

Generating photo-realistic video portrait with arbitrary speech audio is a crucial problem in film-making and virtual reality.

2,263
31 Jan 2023