no code implementations • 18 Aug 2023 • Yeming Chen, Siyu Zhang, Yaoru Sun, Weijian Liang, Haoran Wang
In this work, we propose an efficient computation framework for multimodal alignment by introducing a novel visual semantic module to further improve the performance of the VL tasks.