no code implementations • 5 Apr 2024 • Zitao Shuai, Liyue Shen
Vision-language pre-training (VLP) has arised as an efficient scheme for multimodal representation learning, but it requires large-scale multimodal data for pre-training, making it an obstacle especially for biomedical applications.