no code implementations • 10 Mar 2023 • Jiaqi Xu, Bo Liu, Yunkuo Chen, Mengli Cheng, Xing Shi
Specifically, we design a Text-Guided MultiWay-Sampler based on adapt-pooling residual mapping and self-attention modules to sample long sequences and fuse multi-modal features, which reduces the computational costs and addresses performance degradation caused by previous samplers.
Ranked #1 on TGIF-Transition on TGIF-QA (using extra training data)