Lip Reading

46 papers with code • 3 benchmarks • 5 datasets

Lip Reading is a task to infer the speech content in a video by using only the visual information, especially the lip movements. It has many crucial applications in practice, such as assisting audio-based speech recognition, biometric authentication and aiding hearing-impaired people.

Source: Mutual Information Maximization for Effective Lip Reading

GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis

yerfor/geneface 31 Jan 2023

Generating photo-realistic video portrait with arbitrary speech audio is a crucial problem in film-making and virtual reality.

2,337
31 Jan 2023

OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset

iip-sogang/olkavs-avspeech 16 Jan 2023

Inspired by humans comprehending speech in a multi-modal manner, various audio-visual datasets have been constructed.

26
16 Jan 2023

Audio-Visual Efficient Conformer for Robust Speech Recognition

burchim/avec 4 Jan 2023

We improve previous lip reading methods using an Efficient Conformer back-end on top of a ResNet-18 visual front-end and by adding intermediate CTC losses between blocks.

78
04 Jan 2023

Lip Sync Matters: A Novel Multimodal Forgery Detector

sahibzadaadil/Lip-Sync-Matters-A-Novel-Multimodal-Forgery-Detector APSIPA ASC 2022 2022

Deepfake technology has advanced a lot, but it is a double-sided sword for the community.

0
07 Nov 2022

Relaxed Attention for Transformer Models

Oguzhanercan/Vision-Transformers 20 Sep 2022

The powerful modeling capabilities of all-attention-based transformer architectures often cause overfitting and - for natural language processing tasks - lead to an implicitly learned internal language model in the autoregressive transformer decoder complicating the integration of external language models.

3
20 Sep 2022

Training Strategies for Improved Lip-reading

mpc001/Lipreading_using_Temporal_Convolutional_Networks 3 Sep 2022

In this paper, we systematically investigate the performance of state-of-the-art data augmentation approaches, temporal models and other training strategies, like self-distillation and using word boundary indicators.

365
03 Sep 2022

Speaker-adaptive Lip Reading with User-dependent Padding

ms-dot-k/User-dependent-Padding 9 Aug 2022

In this paper, to remedy the performance degradation of lip reading model on unseen speakers, we propose a speaker-adaptive lip reading method, namely user-dependent padding.

6
09 Aug 2022

Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading

ms-dot-k/Multi-head-Visual-Audio-Memory The AAAI Conference on Artificial Intelligence (AAAI) 2022

With the multi-head key memories, MVM extracts possible candidate audio features from the memory, which allows the lip reading model to consider the possibility of which pronunciations can be represented from the input lip movement.

21
04 Apr 2022

Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video

ms-dot-k/Visual-Audio-Memory ICCV 2021

By learning the interrelationship through the associative bridge, the proposed bridging framework is able to obtain the target modal representations inside the memory network, even with the source modal input only, and it provides rich information for its downstream tasks.

16
04 Apr 2022

Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition

lumia-group/leveraging-self-supervised-learning-for-avsr ACL 2022

In particular, audio and visual front-ends are trained on large-scale unimodal datasets, then we integrate components of both front-ends into a larger multimodal framework which learns to recognize parallel audio-visual data into characters through a combination of CTC and seq2seq decoding.

60
24 Feb 2022