no code implementations • CVPR 2018 • Hanzhang Wang, Hanli Wang, Kaisheng Xu
Vision-to-language tasks require a unified semantic understanding of visual content.
Clustering Image Captioning +3