Similarity Reasoning and Filtration for Image-Text Matching

5 Jan 2021 Haiwen Diao Ying Zhang Lin Ma Huchuan Lu

Image-text matching plays a critical role in bridging the vision and language, and great progress has been made by exploiting the global alignment between image and sentence, or local alignments between regions and words. However, how to make the most of these alignments to infer more accurate matching scores is still underexplored... (read more)

PDF Abstract

Datasets


Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Cross-Modal Retrieval COCO 2014 SGRAF Image-to-text R@1 57.8 # 2
Image-to-text R@10 91.6 # 2
Image-to-text R@5 84.9 # 2
Text-to-image R@1 41.9 # 3
Text-to-image R@10 81.3 # 3
Text-to-image R@5 70.7 # 3
Cross-Modal Retrieval Flickr30k SGRAF Image-to-text R@1 77.8 # 2
Image-to-text R@10 97.4 # 2
Image-to-text R@5 94.1 # 3
Text-to-image R@1 58.5 # 2
Text-to-image R@10 88.8 # 3
Text-to-image R@5 83.0 # 2
Image Retrieval Flickr30K 1K test SGRAF R@1 58.5 # 1
R@10 88.8 # 2
R@5 83.0 # 2

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet