1 code implementation • 19 Aug 2022 • Zohreh Ghaderi, Leonard Salewski, Hendrik P. A. Lensch
To generate proper captions for videos, the inference needs to identify relevant concepts and pay attention to the spatial relationships between them as well as to the temporal development in the clip.
Ranked #7 on Video Captioning on VATEX