Cross-Modal Retrieval

192 papers with code • 13 benchmarks • 21 datasets

Cross-Modal Retrieval is used for implementing a retrieval task across different modalities. such as image-text, video-text, and audio-text Cross-Modal Retrieval. The main challenge of Cross-Modal Retrieval is the modality gap and the key solution of Cross-Modal Retrieval is to generate new representations from different modalities in the shared subspace, such that new generated features can be applied in the computation of distance metrics, such as cosine distance and Euclidean distance.

References:

[1] Scene-centric vs. Object-centric Image-Text Cross-modal Retrieval: A Reproducibility Study

[2] Deep Triplet Neural Networks with Cluster-CCA for Audio-Visual Cross-modal Retrieval

Libraries

Use these libraries to find Cross-Modal Retrieval models and implementations

TF-CLIP: Learning Text-free CLIP for Video-based Person Re-Identification

asuradayuci/tf-clip 15 Dec 2023

Technically, TMC allows the frame-level memories in a sequence to communicate with each other, and to extract temporal information based on the relations within the sequence.

28
15 Dec 2023

Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models

aimagelab/safe-clip 27 Nov 2023

We show how this can be done by fine-tuning a CLIP model on synthetic data obtained from a large language model trained to convert between safe and unsafe sentences, and a text-to-image generator.

13
27 Nov 2023

Contrastive Transformer Learning with Proximity Data Generation for Text-Based Person Search

hcplab-sysu/personsearch-ctlg 15 Nov 2023

Moreover, we propose a proximity data generation (PDG) module to automatically produce more diverse data for cross-modal training.

3
15 Nov 2023

Weakly supervised cross-modal learning in high-content screening

gwatkinson/jump_models 8 Nov 2023

With the surge in available data from various modalities, there is a growing need to bridge the gap between different data types.

1
08 Nov 2023

BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping

mvrl/birdsat 29 Oct 2023

We propose a metadata-aware self-supervised learning~(SSL)~framework useful for fine-grained classification and ecological mapping of bird species around the world.

13
29 Oct 2023

A Prior Instruction Representation Framework for Remote Sensing Image-text Retrieval

jaychempan/PIR ACMMM 2023

Our highlight is the proposal of a paradigm that draws on prior knowledge to instruct adaptive learning of vision and text representations.

17
27 Oct 2023

InvGC: Robust Cross-Modal Retrieval by Inverse Graph Convolution

yimuwangcs/Better_Cross_Modal_Retrieval 20 Oct 2023

However, a recent study shows that multi-modal data representations tend to cluster within a limited convex cone (as representation degeneration problem), which hinders retrieval performance due to the inseparability of these representations.

2
20 Oct 2023

Balance Act: Mitigating Hubness in Cross-Modal Retrieval with Query and Gallery Banks

yimuwangcs/Better_Cross_Modal_Retrieval 17 Oct 2023

In this work, we present a post-processing solution to address the hubness problem in cross-modal retrieval, a phenomenon where a small number of gallery data points are frequently retrieved, resulting in a decline in retrieval performance.

2
17 Oct 2023

PaLI-3 Vision Language Models: Smaller, Faster, Stronger

kyegomez/PALI3 13 Oct 2023

This paper presents PaLI-3, a smaller, faster, and stronger vision language model (VLM) that compares favorably to similar models that are 10x larger.

117
13 Oct 2023

BioBridge: Bridging Biomedical Foundation Models via Knowledge Graphs

ryanwangzf/biobridge 5 Oct 2023

Foundation models (FMs) are able to leverage large volumes of unlabeled data to demonstrate superior performance across a wide range of tasks.

35
05 Oct 2023