Rethinking Visual Geo-localization for Large-Scale Applications

CVPR 2022  ยท  Gabriele Berton, Carlo Masone, Barbara Caputo ยท

Visual Geo-localization (VG) is the task of estimating the position where a given photo was taken by comparing it with a large database of images of known locations. To investigate how existing techniques would perform on a real-world city-wide VG application, we build San Francisco eXtra Large, a new dataset covering a whole city and providing a wide range of challenging cases, with a size 30x bigger than the previous largest dataset for visual geo-localization. We find that current methods fail to scale to such large datasets, therefore we design a new highly scalable training technique, called CosPlace, which casts the training as a classification problem avoiding the expensive mining needed by the commonly used contrastive learning. We achieve state-of-the-art performance on a wide range of datasets and find that CosPlace is robust to heavy domain changes. Moreover, we show that, compared to the previous state-of-the-art, CosPlace requires roughly 80% less GPU memory at train time, and it achieves better results with 8x smaller descriptors, paving the way for city-wide real-world visual geo-localization. Dataset, code and trained models are available for research purposes at https://github.com/gmberton/CosPlace.

PDF Abstract CVPR 2022 PDF CVPR 2022 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Visual Place Recognition Gardens Point CosPlace Recall@1 74.00 # 4
Visual Place Recognition Hawkins CosPlace Recall@1 31.36 # 5
Visual Place Recognition Mapillary val CosPlace (ResNet-101 2048-D) Recall@1 86.7 # 4
Recall@5 92.1 # 4
Recall@10 93.4 # 4
Visual Place Recognition Mapillary val CosPlace Recall@5 89.9 # 8
Recall@10 91.8 # 7
Visual Place Recognition MSLS CosPlace Recall@1 79.6 # 2
Visual Place Recognition Nardo-Air R CosPlace Recall@1 91.55 # 2
Visual Place Recognition Pittsburgh-250k-test CosPlace Recall@1 91.5 # 6
Recall@5 96.9 # 6
Recall@10 97.9 # 6
Visual Place Recognition Pittsburgh-30k-test CosPlace (ResNet-101 2048-D) Recall@1 90.4 # 5
Recall@5 95.7 # 3
Recall@10 96.7 # 2
Visual Place Recognition Pittsburgh-30k-test CosPlace Recall@1 90.45 # 4
Visual Place Recognition SF-XL test v1 CosPlace Recall@1 64.7 # 2
Recall@5 73.3 # 1
Recall@10 76.6 # 1
Visual Place Recognition SF-XL test v2 CosPlace Recall@1 83.4 # 1
Recall@5 91.6 # 1
Recall@10 94.1 # 1
Visual Place Recognition St Lucia CosPlace Recall@1 99.59 # 3
Recall@5 99.9 # 1
Recall@10 99.9 # 1
Visual Place Recognition Tokyo247 CosPlace Recall@1 82.2 # 4
Visual Place Recognition Tokyo247 CosPlace (ResNet-101 2048-D) Recall@5 95.9 # 2
Recall@10 96.5 # 2
Visual Place Recognition VP-Air CosPlace Recall@1 8.12 # 6

Results from Other Papers


Task Dataset Model Metric Name Metric Value Rank Uses Extra
Training Data
Source Paper Compare
Visual Place Recognition 17 Places CosPlace Recall@1 61.08 # 6
Visual Place Recognition Baidu Mall CosPlace Recall@1 41.62 # 7
Visual Place Recognition Laurel Caverns CosPlace Recall@1 24.11 # 7
Visual Place Recognition Mid-Atlantic Ridge CosPlace Recall@1 20.79 # 7
Visual Place Recognition Nardo-Air CosPlace Recall@1 0 # 7
Visual Place Recognition Oxford RobotCar Dataset CosPlace Recall@1 91.10 # 2

Methods


No methods listed for this paper. Add relevant methods here