TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Classification	CIFAR-10	ResNet 9 + Mish	Percentage correct	94.05	# 148
Image Classification	CIFAR-10	ResNet v2-20 (Mish activation)	Percentage correct	92.02	# 168
Image Classification	CIFAR-100	ResNet v2-110 (Mish activation)	Percentage correct	74.41	# 148
Image Classification	ImageNet	CSPResNeXt-50 + Mish	Top 1 Accuracy	79.8%	# 676

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mish-a-self-regularized-non-monotonic-neural/image-classification-on-cifar-10)](https://paperswithcode.com/sota/image-classification-on-cifar-10?p=mish-a-self-regularized-non-monotonic-neural)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mish-a-self-regularized-non-monotonic-neural/image-classification-on-cifar-100)](https://paperswithcode.com/sota/image-classification-on-cifar-100?p=mish-a-self-regularized-non-monotonic-neural)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mish-a-self-regularized-non-monotonic-neural/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=mish-a-self-regularized-non-monotonic-neural)`

Mish: A Self Regularized Non-Monotonic Activation Function

BMVC 2020 · Diganta Misra ·

We propose $\textit{Mish}$, a novel self-regularized non-monotonic activation function which can be mathematically defined as: $f(x)=x\tanh(softplus(x))$. As activation functions play a crucial role in the performance and training dynamics in neural networks, we validated experimentally on several well-known benchmarks against the best combinations of architectures and activation functions. We also observe that data augmentation techniques have a favorable effect on benchmarks like ImageNet-1k and MS-COCO across multiple architectures. For example, Mish outperformed Leaky ReLU on YOLOv4 with a CSP-DarkNet-53 backbone on average precision ($AP_{50}^{val}$) by 2.1$\%$ in MS-COCO object detection and ReLU on ResNet-50 on ImageNet-1k in Top-1 accuracy by $\approx$1$\%$ while keeping all other network parameters and hyperparameters constant. Furthermore, we explore the mathematical formulation of Mish in relation with the Swish family of functions and propose an intuitive understanding on how the first derivative behavior may be acting as a regularizer helping the optimization of deep neural networks. Code is publicly available at https://github.com/digantamisra98/Mish.

PDF Abstract

Code

Add Remove Mark official

digantamisra98/Mish official

1,274

tensorflow/addons

1,680

lessw2020/mish

161

thomasbrandon/mish-cuda

145

tyunist/memory_efficient_mish_swish

See all 10 implementations

Tasks

Add Remove

Image Classification

object-detection

Object Detection

Datasets

CIFAR-10

ImageNet

MS COCO

CIFAR-100

Results from the Paper

Edit

Ranked #148 on Image Classification on CIFAR-100 (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Classification	CIFAR-10	ResNet 9 + Mish	Percentage correct	94.05	# 148	Compare
Image Classification	CIFAR-10	ResNet v2-20 (Mish activation)	Percentage correct	92.02	# 168	Compare
Image Classification	CIFAR-100	ResNet v2-110 (Mish activation)	Percentage correct	74.41	# 148	Compare
Image Classification	ImageNet	CSPResNeXt-50 + Mish	Top 1 Accuracy	79.8%	# 676	Compare

Methods

Add Remove

1x1 Convolution • Average Pooling • Batch Normalization • Bottleneck Residual Block • Channel Shuffle • Concatenated Skip Connection • Convolution • Cosine Annealing • Dense Block • Dense Connections • DenseNet • Depthwise Convolution • Depthwise Separable Convolution • Dropout • Fire Module • Global Average Pooling • Grouped Convolution • Kaiming Initialization • L1 Regularization • Leaky ReLU • Max Pooling • Mish • Mixup • MobileNetV1 • NADAM • Pointwise Convolution • ReLU • Residual Block • Residual Connection • ResNet • ResNeXt • ResNeXt Block • ShuffleNet v2 • ShuffleNet V2 Block • ShuffleNet V2 Downsampling Block • Sigmoid Activation • SimpleNet • Softmax • Softplus • Squeeze-and-Excitation Block • SqueezeNet • Swish • Tanh Activation • Weight Decay • Wide Residual Block • WideResNet • Xavier Initialization • Xception

Edit Social Preview

Mish: A Self Regularized Non-Monotonic Activation Function

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove