A dataset for Visual Voice Activity Detection extracted from the LRS3 dataset.
The dataset contains data to train a Visual Voice Activity Detection(VVAD). The data comes in 4 different flavors:
- faceImages: A series of images of faces with the corresponding label True for speaking and False for not speaking
- lipImages: A series of images of lips with the corresponding label True for speaking and False for not speaking
- faceFeatures: A series of feature maps extracted with dlibs face landmark detection of faces with the corresponding label True for speaking and False for not speaking
- lipFeatures: A series of feature maps extracted with dlibs face landmark detection of lips with the corresponding label True for speaking and False for not speaking