Max-Shot Cross-Lingual Visual Reasoning