VVAD-LRS3

Introduced by Lubitz et al. in The VVAD-LRS3 Dataset for Visual Voice Activity Detection

A dataset for Visual Voice Activity Detection extracted from the LRS3 dataset.

The dataset contains data to train a Visual Voice Activity Detection(VVAD). The data comes in 4 different flavors:

faceImages: A series of images of faces with the corresponding label True for speaking and False for not speaking
lipImages: A series of images of lips with the corresponding label True for speaking and False for not speaking
faceFeatures: A series of feature maps extracted with dlibs face landmark detection of faces with the corresponding label True for speaking and False for not speaking
lipFeatures: A series of feature maps extracted with dlibs face landmark detection of lips with the corresponding label True for speaking and False for not speaking

Homepage

No benchmarks yet. Start a new benchmark or link an existing one.

Paper	Code	Results	Date	Stars

No data loaders found. You can submit your data loader here.

Source: https://arxiv.org/pdf/2109.13789v1.pdf.