VGS (VideoGazeSpeech)

Gaze following attracts much attention recently while existing databases commonly lack audio information. In this work, we collect the first gaze following dataset containing audios, the VideoGazeSpeech Dataset. The dataset is used to evaluate our method and also encourage future research in multi-modal gaze following. Our dataset comprises a total of $35,231$ frames of $29$ videos. Each video in the dataset has an average duration of approximately $20$ seconds and is recorded at a frame rate of $25$ frames per second (fps). The resolution of each video is $1280 \times 720$ pixels, and the entire dataset occupies a storage space of $7.2$ GB.

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

No data loaders found. You can submit your data loader here.

Tasks

Similar Datasets

Gaze360

GazeFollow

VideoAttentionTarget

Usage

License

Unknown

VGS (VideoGazeSpeech)

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

Gaze360

GazeFollow

VideoAttentionTarget

Usage

License

Modalities

Languages

VGS (VideoGazeSpeech)

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

Gaze360

GazeFollow

VideoAttentionTarget

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages