A large, realistic multimodal dataset consisting of real personal photos and crowd-sourced questions/answers.
5 PAPERS • 1 BENCHMARK