OLKAVS (An Open Large-Scale Korean Audio-Visual Speech Dataset)

Introduced by Park et al. in OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset

The dataset contains 1,150 hours of transcribed audio from 1,107 Korean speakers in a studio setup with nine different viewpoints and various noise situations. We also provide the pre-trained baseline models for two tasks, audio-visual speech recognition and lip reading.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages