1 code implementation • 10 Aug 2023 • Zezhong Lv, Bing Su, Ji-Rong Wen
Finally, by suppressing the unimodal effect of masked query, we can rectify the reconstructions of video proposals to perform reasonable contrastive learning.
no code implementations • ICCV 2023 • Heng Zhang, Daqing Liu, Zezhong Lv, Bing Su, DaCheng Tao
Paired video and language data is naturally temporal concurrency, which requires the modeling of the temporal dynamics within each modality and the temporal alignment across modalities simultaneously.