no code implementations • 8 Jan 2024 • Shuxiao Ma, Linyuan Wang, Senbao Hou, Bin Yan
Next, we use the contrast loss function to minimize the distance between the image embedding features and the text embedding features to complete the alignment operation of the stimulus image and text information.