1 code implementation • 21 Jul 2023 • Kathleen M. Lewis, Emily Mu, Adrian V. Dalca, John Guttag
We demonstrate the utility of GIST by fine-tuning vision-language models on the image-and-generated-text pairs to learn an aligned vision-language representation space for improved classification.
Fine-Grained Image Classification Image-text Classification +4
no code implementations • 6 Jul 2023 • Emily Mu, John Guttag, Maggie Makar
Given a similarity metric, contrastive methods learn a representation in which examples that are similar are pushed together and examples that are dissimilar are pulled apart.