no code implementations • 8 Mar 2024 • Pengwei Yin, Guanzhong Zeng, Jingjing Wang, Di Xie
To overcome these limitations, we propose a novel framework called CLIP-Gaze that utilizes a pre-trained vision-language model to leverage its transferable knowledge.