no code implementations • 5 Mar 2024 • Yuan Gao, Kunyu Shi, Pengkai Zhu, Edouard Belval, Oren Nuriel, Srikar Appalaraju, Shabnam Ghadar, Vijay Mahadevan, Zhuowen Tu, Stefano Soatto
We propose Strongly Supervised pre-training with ScreenShots (S4) - a novel pre-training paradigm for Vision-Language Models using data from large-scale web screenshot rendering.
no code implementations • 17 May 2022 • Thomas Delteil, Edouard Belval, Lei Chen, Luis Goncalves, Vijay Mahadevan
In these, text semantics and visual information supplement each other to provide a global understanding of the document.