Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training

CVPR 2020 Weituo HaoChunyuan LiXiujun LiLawrence CarinJianfeng Gao

Learning to navigate in a visual environment following natural-language instructions is a challenging task, because the multimodal inputs to the agent are highly variable, and the training data on a new task is often limited. In this paper, we present the first pre-training and fine-tuning paradigm for vision-and-language navigation (VLN) tasks... (read more)

PDF Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Visual Navigation Cooperative Vision-and-Dialogue Navigation Prevalent Goal Progress 2.44 # 1
Visual Navigation Help, Anna! (HANNA) Prevalent spl 28.72 # 1
Visual Navigation Room-to-Room Prevalent spl 0.51 # 1

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet