A Channel Mix Method for Fine-Grained Cross-Modal Retrieval
In this paper, we propose a simple but effective method for dealing with the challenging fine-grained cross-modal retrieval task where it aims to enable flexible retrieval among subor-dinate categories across different modalities. Specifically, in order to enhance information interaction in different modalities for fine-grained objects, a channel mix method is developed and performed upon the channels of deep activations across dif-ferent modalities. After that, a 1 x 1 convolution is employed to aggregate the mixed channels into a unified feature vector. Moreover, equipped with a novel fine-grained cross-modal cen-ter loss, our method can further improve the intra-class separa-bility as well as inter-class compactness for multi-modalities. Experiments are conducted on the fine-grained cross-modal benchmark dataset and show our superiority over competing methods. Meanwhile, ablation studies also demonstrate the effectiveness of our proposals.
PDF Abstract