5 dataset results for Vision and Language Navigation AND English

Room-Across-Room (RxR) is a multilingual dataset for Vision-and-Language Navigation (VLN) for Matterport3D environments. In contrast to related datasets such as Room-to-Room (R2R), RxR is 10x larger, multilingual (English, Hindi and Telugu), with longer and more variable paths, and it includes and fine-grained visual groundings that relate each word to pixels/surfaces in the environment.

44 PAPERS • 1 BENCHMARK

Talk the Walk

Talk The Walk is a large-scale dialogue dataset grounded in action and perception. The task involves two agents (a “guide” and a “tourist”) that communicate via natural language in order to achieve a common goal: having the tourist navigate to a given target location.

11 PAPERS • NO BENCHMARKS YET

map2seq

7,672 human written natural language navigation instructions for routes in OpenStreetMap with a focus on visual landmarks. Validated in Street View.

3 PAPERS • 2 BENCHMARKS

WebLINX (Real-World Website Navigation with Multi-Turn)

WebLINX is a large-scale benchmark of 100K interactions across 2300 expert demonstrations of conversational web navigation. It covers a broad range of patterns on over 150 real-world websites and can be used to train and evaluate agents in diverse scenarios.

2 PAPERS • 1 BENCHMARK

robo-vln (Robotics Vision-and-Language Navigation)

The Robo-VLN dataset is a continuous control formulation of the VLN-CE dataset by Krantz et al ported over from Room-to-Room (R2R) dataset created by Anderson et al. The details regarding converting discrete VLN dataset into continuous control formulation can be found in our paper.

2 PAPERS • 1 BENCHMARK

Datasets

5 dataset results for Vision and Language Navigation AND English