A large-scale dataset for retrieval and event localisation in video. A unique feature of the dataset is the availability of two audio tracks for each video: the original audio, and a high-quality spoken description of the visual content.
Source: QuerYD: A video dataset with high-quality textual and audio narrationsPaper | Code | Results | Date | Stars |
---|