no code implementations • 2 Dec 2019 • Tianqi Liu, Qizhan Shao
We firstpre-train the model on the 4M training video-level data, andthen fine-tune the model on 237K annotated video segment-level data.
General Classification Video Understanding