Instance-aware Image and Sentence Matching with Selective Multimodal LSTM

Effective image and sentence matching depends on how to well measure their global visual-semantic similarity. Based on the observation that such a global similarity arises from a complex aggregation of multiple local similarities between pairwise instances of image (objects) and sentence (words), we propose a selective multimodal Long Short-Term Memory network (sm-LSTM) for instance-aware image and sentence matching... (read more)

PDF Abstract CVPR 2017 PDF CVPR 2017 Abstract
No code implementations yet. Submit your code now

Datasets


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Image Retrieval Flickr30K 1K test SM-LSTM (VGG) R@1 30.2 # 10
R@10 72.3 # 8

Methods used in the Paper


METHOD TYPE
Memory Network
Working Memory Models