TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Object Detection In Aerial Images	DIOR-R	ViT-G12X4	mAP	73.60	# 2
Semantic Segmentation	ISPRS Potsdam	ViT-G12X4	Overall Accuracy	92.58	# 2
Semantic Segmentation	ISPRS Potsdam	ViT-G12X4	Mean F1	92.12	# 7
Semantic Segmentation	LoveDA	ViT-G12X4	Category mIoU	54.4	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-billion-scale-foundation-model-for-remote/semantic-segmentation-on-loveda)](https://paperswithcode.com/sota/semantic-segmentation-on-loveda?p=a-billion-scale-foundation-model-for-remote)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-billion-scale-foundation-model-for-remote/object-detection-in-aerial-images-on-dior-r)](https://paperswithcode.com/sota/object-detection-in-aerial-images-on-dior-r?p=a-billion-scale-foundation-model-for-remote)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-billion-scale-foundation-model-for-remote/semantic-segmentation-on-isprs-potsdam)](https://paperswithcode.com/sota/semantic-segmentation-on-isprs-potsdam?p=a-billion-scale-foundation-model-for-remote)`

A Billion-scale Foundation Model for Remote Sensing Images

11 Apr 2023 · Keumgang Cha, Junghoon Seo, Taekyung Lee ·

As the potential of foundation models in visual tasks has garnered significant attention, pretraining these models before downstream tasks has become a crucial step. The three key factors in pretraining foundation models are the pretraining method, the size of the pretraining dataset, and the number of model parameters. Recently, research in the remote sensing field has focused primarily on the pretraining method and the size of the dataset, with limited emphasis on the number of model parameters. This paper addresses this gap by examining the effect of increasing the number of model parameters on the performance of foundation models in downstream tasks such as rotated object detection and semantic segmentation. We pretrained foundation models with varying numbers of parameters, including 86M, 605.26M, 1.3B, and 2.4B, to determine whether performance in downstream tasks improved with an increase in parameters. To the best of our knowledge, this is the first billion-scale foundation model in the remote sensing field. Furthermore, we propose an effective method for scaling up and fine-tuning a vision transformer in the remote sensing field. To evaluate general performance in downstream tasks, we employed the DOTA v2.0 and DIOR-R benchmark datasets for rotated object detection, and the Potsdam and LoveDA datasets for semantic segmentation. Experimental results demonstrated that, across all benchmark datasets and downstream tasks, the performance of the foundation models and data efficiency improved as the number of parameters increased. Moreover, our models achieve the state-of-the-art performance on several datasets including DIOR-R, Postdam, and LoveDA.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

object-detection

Object Detection

Object Detection In Aerial Images

Semantic Segmentation

Datasets

ImageNet

DOTA

fMoW

LoveDA

Million-AID

ISPRS Potsdam

SoundingEarth

SSL4EO-S12

Results from the Paper

Edit

Ranked #1 on Semantic Segmentation on LoveDA (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Object Detection In Aerial Images	DIOR-R	ViT-G12X4	mAP	73.60	# 2	Compare
Semantic Segmentation	ISPRS Potsdam	ViT-G12X4	Overall Accuracy	92.58	# 2	Compare
Semantic Segmentation	ISPRS Potsdam	ViT-G12X4	Mean F1	92.12	# 7	Compare
Semantic Segmentation	LoveDA	ViT-G12X4	Category mIoU	54.4	# 1	Compare

Methods

Add Remove

Dense Connections • Layer Normalization • Linear Layer • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Vision Transformer

Edit Social Preview

A Billion-scale Foundation Model for Remote Sensing Images

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove