A large-scale video dataset, featuring clips from movies with detailed captions.
11 PAPERS • 1 BENCHMARK