no code implementations • 19 Feb 2024 • Shigeki Saito, Kazuki Hayashi, Yusuke Ide, Yusuke Sakai, Kazuma Onishi, Toma Suzuki, Seiji Gobara, Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe
Large-scale vision language models (LVLMs) are language models that are capable of processing images and text inputs by a single model.