Image-text matching

84 papers with code • 1 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Image-text matching

Trend	Dataset	Best Model	Paper	Code	Compare
	CommercialAdsDataset	AlignCMSS			See all

Libraries

Use these libraries to find Image-text matching models and implementations

salesforce/lavis

2 papers

8,821

Datasets

CommercialAdsDataset

Most implemented papers

Most implemented Social Latest No code

Visual Semantic Reasoning for Image-Text Matching

KunpengLi1994/VSRN • • ICCV 2019

It outperforms the current best method by 6. 8% relatively for image retrieval and 4. 8% relatively for caption retrieval on MS-COCO (Recall@1 using 1K test set).

Paper
Code

ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-verified Image-Caption Associations for MS-COCO

naver-ai/eccv-caption • 7 Apr 2022

Image-Text matching (ITM) is a common task for evaluating the quality of Vision and Language (VL) models.

Paper
Code

Dissecting Deep Metric Learning Losses for Image-Text Retrieval

littleredxh/vse-gradient • 21 Oct 2022

In the event that the gradients are not integrable to a valid loss function, we implement our proposed objectives such that they would directly operate in the gradient space instead of on the losses in the embedding space.

Paper
Code

Self-supervised vision-language pretraining for Medical visual question answering

pengfeiliheu/m2i2 • • 24 Nov 2022

Medical image visual question answering (VQA) is a task to answer clinical questions, given a radiographic image, which is a challenging problem that requires a model to integrate both vision and language information.

Paper
Code

A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval

VL-Group/2022-NeurIPS-DAA • • NeurIPS 2022 2022

To verify the effectiveness of our approach, extensive experiments are conducted on MS-COCO, CUB Captions, and Flickr30K, which are commonly used in cross-modal retrieval.

Paper
Code

Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations

zjukg/structure-clip • • 6 May 2023

In this paper, we present an end-to-end framework Structure-CLIP, which integrates Scene Graph Knowledge (SGK) to enhance multi-modal structured representations.

Paper
Code

Learning Two-Branch Neural Networks for Image-Text Matching Tasks

BryanPlummer/cite • • 11 Apr 2017

Image-language matching tasks have recently attracted a lot of attention in the computer vision field.

Paper
Code

Deep Cross-Modal Projection Learning for Image-Text Matching

YingZhangDUT/Cross-Modal-Projection-Learning • • ECCV 2018

The key point of image-text matching is how to accurately measure the similarity between visual and textual inputs.

Paper
Code

Position Focused Attention Network for Image-Text Matching

HaoYang0123/Position-Focused-Attention-Network • • 23 Jul 2019

Then, an attention mechanism is proposed to model the relations between the image region and blocks and generate the valuable position feature, which will be further utilized to enhance the region expression and model a more reliable relationship between the visual image and the textual sentence.

Paper
Code

Learning fragment self-attention embeddings for image-text matching

yiling2018/saem • • ACMMM 2019

In this paper, we propose Self-Attention Embeddings (SAEM) to exploit fragment relations in images or texts by self-attention mechanism, and aggregate fragment information into visual and textual embeddings.

Paper
Code

Image-text matching

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result