We propose a deep convolutional neural network architecture codenamed "Inception", which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014).
Our approach leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between language and visual data.
SOTA for Text-Image Retrieval on COCO
We propose a novel paradigm for evaluating image descriptions that uses human consensus.
In this paper, we propose an effective feature representation called Local Maximal Occurrence (LOMO), and a subspace and metric learning method called Cross-view Quadratic Discriminant Analysis (XQDA).
#45 best model for Person Re-Identification on DukeMTMC-reID
The current leading approaches for semantic segmentation exploit shape information by extracting CNN features from masked image regions.
#30 best model for Semantic Segmentation on PASCAL Context
Correlation clustering, or multicut partitioning, is widely used in image segmentation for partitioning an undirected graph or image with positive and negative edge weights such that the sum of cut edge weights is minimized.