TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Classification	ImageNet	Container Container	Top 1 Accuracy	82.7%	# 465
Image Classification	ImageNet	Container Container	Number of params	22.1M	# 564
Image Classification	ImageNet	Container Container	GFLOPs	8.1	# 273
Image Classification	ImageNet	Container-Light	Top 1 Accuracy	82%	# 530
Image Classification	ImageNet	Container-Light	Number of params	20M	# 536
Image Classification	ImageNet	Container-Light	GFLOPs	3.2	# 176

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/container-context-aggregation-network/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=container-context-aggregation-network)`

Container: Context Aggregation Network

2 Jun 2021 · Peng Gao, Jiasen Lu, Hongsheng Li, Roozbeh Mottaghi, Aniruddha Kembhavi ·

Convolutional neural networks (CNNs) are ubiquitous in computer vision, with a myriad of effective and efficient variations. Recently, Transformers -- originally introduced in natural language processing -- have been increasingly adopted in computer vision. While early adopters continue to employ CNN backbones, the latest networks are end-to-end CNN-free Transformer solutions. A recent surprising finding shows that a simple MLP based solution without any traditional convolutional or Transformer components can produce effective visual representations. While CNNs, Transformers and MLP-Mixers may be considered as completely disparate architectures, we provide a unified view showing that they are in fact special cases of a more general method to aggregate spatial context in a neural network stack. We present the \model (CONText AggregatIon NEtwoRk), a general-purpose building block for multi-head context aggregation that can exploit long-range interactions \emph{a la} Transformers while still exploiting the inductive bias of the local convolution operation leading to faster convergence speeds, often seen in CNNs. In contrast to Transformer-based methods that do not scale well to downstream tasks that rely on larger input image resolutions, our efficient network, named \modellight, can be employed in object detection and instance segmentation networks such as DETR, RetinaNet and Mask-RCNN to obtain an impressive detection mAP of 38.9, 43.8, 45.1 and mask mAP of 41.3, providing large improvements of 6.6, 7.3, 6.9 and 6.6 pts respectively, compared to a ResNet-50 backbone with a comparable compute and parameter size. Our method also achieves promising results on self-supervised learning compared to DeiT on the DINO framework. Code is released at \url{https://github.com/allenai/container}.

PDF Abstract

Code

Add Remove Mark official

allenai/container official

gaopengcuhk/Container official

alrhub/arturo

philippdahlinger/ltsgns_ai4science

Tasks

Add Remove

Image Classification

Inductive Bias

Instance Segmentation

object-detection

Object Detection

Self-Supervised Learning

Semantic Segmentation

Datasets

ImageNet

Results from the Paper

Edit

Ranked #465 on Image Classification on ImageNet

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Classification	ImageNet	Container Container	Top 1 Accuracy	82.7%	# 465	Compare
			Number of params	22.1M	# 564	Compare
			GFLOPs	8.1	# 273	Compare
Image Classification	ImageNet	Container-Light	Top 1 Accuracy	82%	# 530	Compare
			Number of params	20M	# 536	Compare
			GFLOPs	3.2	# 176	Compare

Methods

Add Remove

1x1 Convolution • Absolute Position Encodings • Adam • Attention Dropout • BPE • Convolution • DeiT • Dense Connections • Dropout • Feedforward Network • Focal Loss • FPN • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • RetinaNet • Scaled Dot-Product Attention • Softmax • Transformer • Vision Transformer

Edit Social Preview

Container: Context Aggregation Network

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove