A Channel Mix Method for Fine-Grained Cross-Modal Retrieval

In this paper, we propose a simple but effective method for dealing with the challenging fine-grained cross-modal retrieval task where it aims to enable flexible retrieval among subor-dinate categories across different modalities. Specifically, in order to enhance information interaction in different modalities for fine-grained objects, a channel mix method is developed and performed upon the channels of deep activations across dif-ferent modalities. After that, a 1 x 1 convolution is employed to aggregate the mixed channels into a unified feature vector. Moreover, equipped with a novel fine-grained cross-modal cen-ter loss, our method can further improve the intra-class separa-bility as well as inter-class compactness for multi-modalities. Experiments are conducted on the fine-grained cross-modal benchmark dataset and show our superiority over competing methods. Meanwhile, ablation studies also demonstrate the effectiveness of our proposals.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods